The subject of storage grids is gaining some momentum within the industry as perhaps the next big thing.
It probably started a couple of years ago when NetApp acquired Spinnaker Networks to add grid-like capabilities to its storage operating system, Data ONTAP 7G. More recently, EMC purchased grid software from Acxiom Corp. that provides a single location where services and data can be manipulated, stored and made available to applications. Other big vendors like Sun, HP and IBM, as well as smaller players like Panasas, Exanet, YottaYotta and ExaGrid are making plenty of noise about grids. So what’s all the fuss?
“It’s difficult to tell what’s happening with grids since it’s mostly a marketing-driven exercise today,” says Simon Robinson, an analyst with the 451 Group. “Our extensive grid research has yet to unearth a major project in implementing a grid architecture related to storage.”
He feels that, by and large, the storage industry has co-opted grid terminology to support marketing efforts to promote certain scalability, ease of management and price/performance characteristics. Large NAS farms and utility storage programs such as Sun’s Storage Grid are a couple of examples, he says. In reality, though, actual utilization of grid technology has been very limited.
William Hurley of Data Mobility Group concurs.
“Many vendors have an interest in, or may even market grid capability messages,” says Hurley. “But in reality, grids are not yet a vigorous commercial enterprise market and therefore specific grid products from incumbents are being developed for future, not near-term, delivery.”
Defining the ‘G’ Word
Defining a storage grid isn’t as easy as it appears. Essentially, it is a method of linking independent storage nodes that are then monitored by a controlling software layer. It offers easy management, fault tolerance and access at the file and block level. Such an approach is said to reduce down costs, speed up backups and improve utilization.
“Defining the ‘G’ word is a problem when you use grid and storage in the same breath,” says David Freund, an analyst at Illuminata. “It can mean just about anything from using something from the Globus Toolkit to using distributed techniques such as clustering.”
Part of the confusion stems from muddy messaging from vendors. Array, networking and software companies are adding features to facilitate data sharing. Examples include more network-efficient replication or copy processes, improving synchronous data replication functions and enabling multi-site copying. Similarly, other companies are working closely with ISVs such as Oracle in the development of mutually complementary functions in limited shared or simple site-to-site environments. Some of this work is being incorrectly characterized as grid-based storage.
“Grid deployments by end users have been generally limited to government and academic research,” says Hurley. “A few larger, forward-thinking companies have begun the slow process of re-architecting data stores and storage networks with the goal of achieving data federation as a first step toward realizing grid outcomes, such as workload portability and ‘anywhere’ data accessibility.”
Such innovation addresses obvious challenges in storage. Freund points out some of the problems experienced by existing storage platforms when utilizing a grid. Large NAS appliances and traditional clusters, for example, have trouble scaling when working with large research grids. Bottlenecks occur when as such systems try to create thousands of files every minute. He suggests that grids will eventual parallelize file creation and other functions to eliminate bottlenecks.
Vendor Gridlock
Approaches to grid-based storage, though, vary based on the vendor involved. Freund says Luster and Panasas use an object approach to produce a smarter disk, then layer a file system on top of that to create a grid or cluster. Panasas prefers the term cluster — lots of servers cooperating on an application.
“The Panasas Storage Cluster is a grid of intelligent storage components that can be scaled to meet the needs of the compute environment,” says Larry Jones, vice president of marketing at Panasas. “Clustering is the key to grid storage, as it allows storage to take advantage of the same ability to run in parallel that we are doing on the application side with compute grids.”
Petroleum Geo-Services Corp. (PGS), for example, uses the Panasas DirectFLOW data path to provide direct access between Linux cluster nodes and Panasas storage in order to increase the performance and simplify management of seismic data processing.
What other approaches exist? Hurley thinks IBM has the lead due to its active role in the Global Grid Forum (GGF). He also sees Oracle making a lot of progress by pushing its partners to come up with ways to enhance its 10g Grid family.
“I would consider the Oracle 10g package to be the most practical example of storage grid technology,” says Hurley. “HP’s AppIQ and EMC’s Control Center with the Smarts technology can provide excellent management points for enterprise end-users moving, over time, toward a grid architecture.”
According to Freund, NetApp and EMC take a similar approach.
“EMC has realized that the real value of storage has moved up the stack and into the things that manage what goes into storage,” says Freund. “Storing files has become a commodity.”
Meanwhile, NetApp has made its FlexVol technology a key component of Data ONTAP 7G as it attempts to build grid-based storage systems. FlexVol enables the pooling of storage resources such that all data volumes can take advantage of the total available capacity and performance of available hardware.
“The virtualization capabilities within Data ONTAP 7G and FlexVol technology will provide the foundation for our future multi-node, scale-out systems which fully incorporate storage grid capabilities,” says Patrick Rogers, vice president for products and partners at Network Appliance.
NetApp, he says, is in the process of completing the integration of technology acquired from Spinnaker into the Data ONTAP operating system. The idea is to create a storage system that can scale many modular storage building blocks within a single global namespace. All resources within the storage grid architecture will be pooled into a coherent system with a single management interface.
“Early deployments of our scale-out systems have been tested with several customers in high-performance computing environments,” says Rogers. “To date, these have been customers with large numbers of Linux compute servers in technical markets such as energy and entertainment.”
Beyond the Hype
Despite the hype, Freund expects grids to survive the inevitable swelling and fading of any buzzword fad. But, he says, the point of a scaled modular parallel approach to storage is going to stick.
“Regardless of the definitions and semantic hairsplitting, customers don’t care whether it’s a grid, a cluster or a ‘virtualized bunch of things that do stuff,'” he says. “The vendors that win will tie the modules together seamlessly.”
The optimum goal for end users, says Hurley, would be to achieve a condition with the data stores such that it didn’t matter where a specific piece of data or data set was physically located, and yet users, to whom location is deterministic, could still perform the application-based tasks required of their professional role. A more realistic expectation, though, might be to achieve new levels of efficiency to lower the barriers to data migration or replication between geographies as well as to facilitate wide area access to mission-critical data that resides in a highly secure, resilient central data center.
“If apps need to be moved across geographical distances to run, then so must the data,” says Hurley. “Grids will support application-specific accessibility over wide areas, giving every application user, regardless of location, a local look and feel to an application and its performance.”
Robinson, though, isn’t so sure that grids really will play a part in the future of storage. He sees a battle between two schools of thought. One seeks to make storage become more distributed to support grids, to get the data to where it’s needed. The other possibility is that data will become more centralized or consolidated in order to realize cost benefits.
“The current trend in storage generally is certainly leaning towards the latter, and I don’t expect the emergence of grids to substantially affect this trend,” says Robinson.
For more storage features, visit Enterprise Storage Forum Special Reports