Big Data: From Information to Knowledge - Page 5
Transforming Information to Knowledge
Remember that our wonderful definition for Big Data involves two steps: 1) transforming data to information, and 2) transforming information to knowledge. Both steps aren't easy and can involve a great deal of computation. But how do you do these transformations? The ultimate answer lies with the individual doing the analyses and the particular field of study.
However, let's briefly touch on one possible tool that could be useful -- neural networks.
Neural networks can be used for a variety of things, but my comments are not directed at using neural networks in the more traditional ways. Rather,consider taking the data you are interested in or even the information, and training a neural network with some defined inputs and defined outputs. Hopefully, you have more than just a couple of outputs since you can many times just create a number of 2D plots to visualize the outputs as a function of the inputs (unless you have a huge number of inputs). Once you have a trained net, a very useful feature is to examine the details of the network itself. For example, examining the weights connecting the inputs to the hidden layer and from the hidden layer to the outputs can possibly tell you something about how important various inputs are to the output. Or how combinations of inputs can affect the output or outputs. This is even more useful when you have several, possibly many, outputs and you want to examine how inputs affect each of the outputs.
Neural networks could enjoy a renaissance of sorts in Big Data if they are used to help examine the information and perhaps even turn it into knowledge.
This and the previous two articles are intended to be a starting point for discussing Big Data from the top, while the first article in the series started at the bottom. But the topic of Big Data is so broad and so over-hyped that it is difficult to concisely say what Big Data is and why it is a real topic and not YABW (yet another buzzword). There are several facets to Big Data that must be carefully considered before diving in head first. Some of the facets that I have tried to discuss are:
- What is Big Data?
- Why is it important or useful?
- How do you get data into "Big Data"?
- How do you store the data?
- What tools are used in Big Data, and how can these influence storage design?
- What is Hadoop, and how can it influence storage design?
- What is MapReduce, and how does in integrate with Big Data?
Hopefully, the discussion has caused you to think and perhaps even use Big Data tools like Google to search for information and create knowledge (sorry -- had to go there). If you are asking more questions and wondering about clarification that means you have gotten what I intended from the article.
And now, back over to Henry!
Jeff Layton is the Enterprise Technologist for HPC at Dell, Inc., and a regular writer of all things HPC and storage.
Henry Newman is CEO and CTO of Instrumental Inc. and has worked in HPC and large storage environments for 29 years. The outspoken Mr. Newman initially went to school to become a diplomat, but was firmly told during his first year that he might be better suited for a career that didn't require diplomatic skills. Diplomacy's loss was HPC's gain.