Hadoop Makes Sense of Lots of Data Page 2
Cloudera and Pentaho Build on HadoopMike Karp, an analyst with Ptak, Noel & Associates, cautions that any kind of open-source software is by its very nature a double-edged sword: Cheap to implement, but often hard to find adequate support, especially in the early stages of adoption.
"Most of where the support would come from, after all, is a group of volunteers; as a result, companies are often nervous about doing open source code with business-critical applications," said Karp. "The good news, of course, is that these volunteers frequently are often inspired to write great code, and there's plenty of evidence in the past that open-source projects have achieved great success."
That's where companies like Cloudera and Pentaho come in. They build a business model around taking top-notch open source software and supplying the bells and whistles to make businesses trust it in an enterprise environment. Cloudera provides a Hadoop-based data management platform for the enterprise. Its founding team came from Web companies such as Facebook, Google and Yahoo. It offers services, support and training, and its largest customer deployment in production is more than 4.5 petabytes, running in more than 500 servers.
"Cloudera is interesting because it intermediates between the open source world and the users in much the same way that Red Hat or SUSE do in the Linux world, which is to say it provides much greater assurance in terms of support for Hadoop in critical environments," said Karp. "This will be particularly important as cloud and locally virtualized environments make flexible data processing more important."
Business IntelligenceOne of the primary value propositions is adding customer value. Early adopters like Google, Amazon and Facebook used Hadoop to unlock the enormous value buried in the massive amounts of data they collected. By using analytical techniques to comb through data at volume, these companies deliver a better customer experience: on-target search results, more interesting products, better content and more precisely targeted ads.
"We are seeing a lot of traction in financial services, government, telecom, research institutions and other markets where a lot of data is involved," said Awadallah. "Credit card companies, for instance, are using it for things like fraud detection."
Top management, too, is beginning to realize the potential that might be sitting stored and unutilized within the enterprise. Hadoop appears to be the right tool at the right time to allow organizations to triangulate what people are doing at their sites in order to do a better job of turning prospects into customers, offering them what they want in a timely manner, spotting trends and reacting to them in real time.
While Cloudera offers an enterprise-ready distribution of Hadoop, Pentaho supplies integrated BI tools based on the Lucene/Solr open source search technology. The depth to which users can drill down depends purely on the ability to store the lowest levels of data in a format that can be queried. Drilling down is typically not hard with a BI application based on transactional data. However, digging into systems based on blog, social media or telco data is a different story. Some of these data sets include billions or records per day. They are too large for most relational databases and too expensive at this scale of storage. The solution has been to aggregate the data as it comes in, and being unable to store it all, throw the raw data away.
"Hadoop provides the ability to store and process volumes of data, but lacks graphical interfaces for loading, transforming, modeling or visualizing this data," said Dixon. "Pentaho provides these other capabilities and will enable more companies to use Hadoop to solve large-scale data problems."
He sees this as good for the storage networking industry as a whole. Reason: As more companies use Hadoop we will see an increase in the amount of data stored. However, it might eventually hurt the sales at the high-end of the array market.
Follow Enterprise Storage Forum on Twitter