SHARE
Facebook X Pinterest WhatsApp

Big Data Storage Buying Guide

Over recent months, Enterprise Storage Forum has prepared a series of buying guides covering all aspects of storage. This one takes a somewhat different tack, providing advice from analysts on how storage managers should be addressing big data. It covers specific tools, proper planning and architectural considerations, as well as the evolving field of big […]

Written By
thumbnail Drew Robb
Drew Robb
Mar 20, 2013
Enterprise Storage Forum content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Over recent months, Enterprise Storage Forum has prepared a series of buying guides covering all aspects of storage. This one takes a somewhat different tack, providing advice from analysts on how storage managers should be addressing big data. It covers specific tools, proper planning and architectural considerations, as well as the evolving field of big data security.

“Big data, both large file sizes and a lot of smaller unstructured files, is the fastest growing segment of stored content,” said Thomas Coughlin, analyst with Coughlin Associates. “Cost-effective storage tiering, available metadata and content management are key elements in maintaining and gaining value from large data libraries.”

He passed along an important tip to those looking to invest in big data. Assailed by hype from the bulk of the storage vendor community about their brand new Hadoop capabilities, it would seem that they can all competently deal with any and all big data challenges. Unfortunately, that is far from the truth. Care must be taken to find the approach that integrates best with your existing environment.

“Storage managers need to find tools that are most effective for the types of big data that they are responsible [for] since one size does not, generally, fit all,” said Coughlin.

Further, big data does not just mean buying a big Hadoop data store. That may be only the beginning of the journey. Depending upon the frequency of use and what is considered to be an acceptable degree of latency for content access, appropriate storage tiering may be required. That could include flash memory, hard disk drives (HDD) and possibly even magnetic tape storage, added Coughlin. All of these products should ideally support file-based and object-based storage, particularly since many content libraries are accessed through the cloud. Deduplication, replication and erase coding may also play a big role in large data retention. And of course, the point of big data is to unleash data analytic tools upon it to unlock hidden trends and competitive advantage. That may require extensive coordination with other business units and application owners.

“Depending upon the strength of the IT team, various approaches can be taken from the open source, build-your-own activities of companies like Facebook and Google to big-data-in-a-box-type approaches suiting conventional storage products,” said Coughlin.

Strategic View

A common fault in such a marketing climate is to rush into purchases too quickly. What is required in times like this is a cool head and a strategic direction.

“It’s time for data center managers to seriously consider the approach they need to take — does it need to be all flash, or should they use caching and a smaller amount of flash storage, perhaps in the form of solid state drives (SSDs)? asked Jim Handy, an analyst with Objective Analysis. “An all-flash strategy might be appealing because of its determinism, but if their needs grow exponentially, the cost to keep pace will soar.”

A cached approach, on the other hand, may not look as good due to concerns about lags in the event of a cache miss. But keep in mind that this is something systems already deal with every day since the virtual memory systems in existing servers are as much based on probabilities of page hits and misses as are SSD caches. Handy advises users to not pay over the odds for performance they don’t really need — but not to skimp for the data sets and applications that matter. He mentioned companies like Pure, Nimbus, Violin and Skyera that focus their attention on delivering solid state storage at a price that can be lower than hard disk drive (HDD) based storage arrays, with the hopes that users will simply pile their entire database into flash and leave it at that.

Don’t Forget Security

An integral part of strategy is how to secure that growing stash of unstructured data. A recent study of organizations that experienced attacks found that 86 percent had evidence of the breach in their existing log data, but either failed to notice or act upon that information. In addition, 92 percent of incidents were discovered by a third party, and 85 percent took weeks or more to discover.

“Remember the three V’s of Big Data: the capability to deal with data variety and velocity as well as volume,” said Scott Crawford, Managing Research Director at IT and data management industry analyst firm Enterprise Management Associates. “The distributed, parallel nature of environments like Hadoop support greater efficiency and faster performance in executing analysis across larger bodies of data, through divide and conquer techniques such as MapReduce. They can make more subtle attacks more difficult to hide.”

He pointed out that the open source distribution of Hadoop does not include many security capabilities, though commercial distributions such as Shadoop can often add features such as access control, audit logging and authentication. In addition, IBM InfoSphere Guardium has introduced tools for securing big data environment. Disk encryption is another recommended action. But remember that big data is a recent phenomenon and the security community has still to catch up with the capacity to store it. As mentioned earlier, careful planning and due diligence of the needs of your own specific environment apply just as much to securing big data as to storing it.

“Developing strategies for aligning the need to protect data, control access and monitor activity that map well to the highly distributed nature of emerging Big Data environments — without interfering with their value — is still an evolving field,” said Crawford.

thumbnail Drew Robb

Drew Robb is a contributing writer for Datamation, Enterprise Storage Forum, eSecurity Planet, Channel Insider, and eWeek. He has been reporting on all areas of IT for more than 25 years. He has a degree from the University of Strathclyde UK (USUK), and lives in the Tampa Bay area of Florida.

Recommended for you...

What is Fibre Channel over Ethernet (FCoE)?
Drew Robb
Dec 8, 2023
Best Enterprise Hard Drives for 2023
Leon Yen
Nov 17, 2023
What Is Fibre Channel over IP (FCIP)
Drew Robb
Nov 16, 2023
RPO and RTO: Understanding the Differences
Zac Amos
Nov 13, 2023
Enterprise Storage Forum Logo

Enterprise Storage Forum offers practical information on data storage and protection from several different perspectives: hardware, software, on-premises services and cloud services. It also includes storage security and deep looks into various storage technologies, including object storage and modern parallel file systems. ESF is an ideal website for enterprise storage admins, CTOs and storage architects to reference in order to stay informed about the latest products, services and trends in the storage industry.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.