In a typical storage system, a major challenge is keeping track of where each block of data sits on all of the disk drives in the system. As the storage system scales, this can mean millions of blocks needing to be managed instantaneously. Ultimately, this creates a significant bottleneck as systems scale in capacity.
One solution to this is problem is a more intelligent object-based storage architecture. The idea is to manage data as large virtual objects rather than small blocks. Objects are different than traditional blocks in that they contain both application data and metadata attributes about that data. Thus the resulting storage subsystems become more intelligent by distributing the large portion of the metadata management to the storage subsystem instead of the file system.
“An object-based storage system distributes the responsibility for keeping track of the objects in a particular data structure to the actual object itself,” said Larry Jones, Vice President of Product Marketing at Panasas, Inc., of Fremont, CA.
The object sits directly on the disk. Let’s call these object-based disks Object Storage Devices (OSD). An OSD can be created with devices such as disk drives, blades or as part of an array. As it tracks the objects as well as where all of the blocks of data reside, it makes life easier for the application to rapidly access a file. Further, you can scale the system up without compromising performance.
How does it work? In traditional storage architectures, as data is stored or retrieved, each block must pass through a single filer head or NFS server to be processed. The control path for metadata is connected to the data path. If too many clients access the data, that single path bottlenecks. An object-based storage system uses a parallel file system to separate the control path from the data path. This can pay big performance dividends when working with large files or datasets.
OSD Gains Traction
Such technology is more than just a good idea. The Storage Networking Industry Association (SNIA) has had a technical working group for OSD standards for some time. And in September, the ANSI T10 governing body ratified an OSD standard.
Further, products have already hit the market that take advantage of objects.
Panasas, for example, has developed NAS storage for Linux clusters. Known as he Panasas ActiveScale Storage Cluster, it combines a parallel file system with object-based storage. It delivers scalable bandwidth and random I/O to accelerate application throughput and streamline operations within a single scalable namespace. It is packaged as dense, redundant 5 TB shelves and supports two modes of data access: the DirectFLOW data path and the NFS/CIFS data path.
According to Jones, DirectFLOW is an “out-of-band” approach that enables parallel data access between blades and clients. The NFS/CIFS part, on the other hand operates “in-band” and supports UNIX NFS and Windows.
This approach is gaining traction in environments running high-performance applications involving oil exploration, digital animation and other intensive computing. Among the converts is Maxus US Exploration (Houston, Tex.), a subsidiary of Repsol YPF, one of the top 10 private oil companies in the world, with operations in 28 countries and the largest private energy company in Latin America (with an average daily production of 1.2 million barrels of oil per equivalent).
It uses seismic processing applications in deep water exploration. For a recent project in the Gulf of Mexico, for example, Maxus needed an in-house seismic processing application that could maximize data processing. That way, the company’s experts could focus on interpreting data rather than processing it.
“The goal was to maximize the compute power in an effort to minimize the amount of time necessary to process the data,” said Francisco Ortigosa, chief geophysicist at Maxus. “As we were building our system, we realized that we needed an exceptionally fast storage system.” The company opted for object-based storage and adopted a 15 TB Panasas Active Scale Storage Cluster. It runs in conjunction with 64 dual processor Rackable Networks nodes (3GHz Intel Xeon) on RedHat 8 and a Cisco 4500 Series-based network. This provided the company with a performance hike of up to 11x.
“The significant performance improvements allow us to focus on what we do best — interpreting seismic data and generating new business opportunities,” said Ortigosa.
Object-ive
Panasas is not alone in the object-based storage field. Other vendors are gradually introducing their own architectures for clustered network storage. The most advanced is the Lustre open-source initiative, and research projects are underway at Seagate, IBM, and others.
The likelihood is, therefore, that object-based architectures will be seen more and more in the years to come. With storage capacity continuing to explode, it opens the door to rapid access for a large user community into an ever-growing pool of storage.
Article courtesy of Enterprise IT Planet