As storage analyst Greg Schulz puts it, “Big data is a great, big catch-all for things.”
That said, there are some stand-out storage tools around designed to help storage administrators tackle a growing mountain of big data. Not surprisingly, many of them are concerned with Hadoop.
SGI InfiniteStorage enables storage to be virtualized into a fabric that spans high-performing flash to low-cost tape. This is done in a way that keeps data online at all times and that is said to be transparent to users.
“The SGI InfiniteStorage hardware and software ecosystem is how SGI has been addressing big data problems for two decades, and is in production in hundreds of the most demanding data management environments around the world ranging from weather forecasting, life sciences, manufacturing, media and education,” said Floyd Christofferson, director of storage product marketing at SGI.
Red Hat Storage Server 2.0
According to a recent report by the Linux Foundation, the majority of big data implementations run on Linux. It makes sense, therefore, that Red Hat is a major player in the big data storage space. Red Hat Storage Server 2.0 allows data to be stored and managed in one place and accessible by many enterprise workloads, said Ranga Rangachari, vice president and general manager, Red Hat storage business unit.
“Given the size and growth of data today, enterprises can't afford to build dedicated storage silos,” said Rangachari. “The ideal approach is having the data reside in a general enterprise repository and making the data accessible to many enterprise workloads.
Accordingly, Red Hat has teamed up with Intel to create better open source big data applications. As an initial action, Red Hat is taking advantage of the recently released Intel Distribution for Apache Hadoop software, integrating it with Red Hat Storage Server 2.0 and the Red Hat Enterprise Linux operating system. Further, a Red Hat Storage Apache Hadoop plug-in is about to be released to the open source community as a storage option for enterprise Hadoop deployments.
“Red Hat is uniquely positioned to excel in enterprise big data solutions, a market that IDC expects to grow from $6 billion in 2011 to $23.8 billion in 2016,” said Ashish Nadkarni, an analyst at IDC. “Red Hat is one of the very few infrastructure providers that can deliver a comprehensive big data solution because of the breadth of its infrastructure solutions and application platforms for on-premises or cloud delivery models.”
EMC Pivotal HD
Speaking of new Hadoop distributions, EMC’s version is called Pivotal HD, and it features integration with EMC’s Greenplum massively parallel processing (MPP) database. An engineering technology called HAWQ provides SQL processing for Hadoop and is touted as bringing more than 100X performance improvement to queries and workloads.
“Hadoop is a big deal and the key to unlock big data’s transformational potential, and we are marrying it with Greenplum technology to help catapult Hadoop into wide-scale adoption,” said Scott Yara, senior vice president of products, Greenplum, a division of EMC.
DataDirect Hadoop Apache Hive Driver
Part of the allure of Hadoop is that processing unstructured data into meaningful forms can yield intelligence that complements traditional analytics. The challenge is connecting existing business intelligence and data analytic tools to stored Hadoop data. The DataDirect driver for Apache Hive is said to be the only fully-compliant driver supporting multiple Hadoop distributions out-of-the-box, according to Michael Benedict, vice president of data connectivity at Progress DataDirect.
“Without the DataDirect Hive driver it would be difficult to access and analyze data, as Hadoop can store so much that it can become quite difficult to access it — especially if you need something quickly,” stated Benedict. “The DataDirect Hadoop Driver helps access information from the Hive Data Warehouse in real-time, making data analysis much easier.”
PMC-Sierra has released a host bust adapter (HBA) for big data storage, known as the Adaptec 71605H Host Bus Adapter (or the Series 7H family). These PCIe HBAs offer high-performance I/O and low latency with broad device compatibility. They make use of PMC's PM8018 16x6G SAS protocol controller and support SAS and SATA interfaces. They can also connect up to 16 solid state drives or hard drives. The HBA is capable of executing over one million input/output operations per second (IOPS) with 6.6 GB/sec sustained throughput.
“One of the paramount use cases for HBAs in the data center environment is to connect a large number of drives for storage while complementing the growing need for higher density and lower cost,” said Zaki Hassan, director of product marketing for the enterprise storage division, PMC. “Series 7H HBAs provide 2x the number of ports over other commercially available solutions in market. These high port count, low profile HBAs make it possible for data centers to optimize storage connectivity while lowering cost.”
Attunity RepliWeb for Enterprise File Replication
Attunity RepliWeb for Enterprise File Replication (EFR) deals with a vital aspect of big data – how to replicate huge stores of data. Its purpose is to easily replicate data files to, from and between Apache Hadoop data sets. Matt Benati, Vice President of Global Marketing at Attunity explained that the Hadoop platform is made to consume only very large volumes of data. However, some enterprises may have smaller segments of data they need to merge with their big data for more accurate analytics. Attunity helps these companies to move both big and small data sets from various sources into Hadoop.
“Moving data over WAN in a timely fashion is difficult,” added Benati. “Attunity’s in-memory stream processing capabilities and technology optimizations make moving big data easy – whether on premises or in the cloud.”
The open source distribution of Hadoop does not include many security capabilities. That’s where commercial distributions come in. They will often add features such as access control and logging. “Shadoop introduces role-based access control to a Hadoop cluster, audit log capabilities and Kerberos authentication,” explained Scott Crawford, an analyst at Enterprise Management Associates.
IBM InfoSphere Guardium
Crawford noted that the existing market of database security solutions is aware (as is the larger data management market) of the changes to their space being wrought by big data, but so far there are not yet many approaches that attack big data challenges. But that is beginning to change.
IBM is one of the early ones out of the gate. “IBM InfoSphere Guardium has introduced tools for securing Big Data environment,” said Crawford.