As the growth of data and files continues to explode, use of traditional backup techniques becomes more and more difficult, fueling the growth of hierarchical storage management systems (HSM) systems. But the process is not without some glaring problems. Some of the problems that affect HSM and archiving also affect backup, but archiving systems generally […]
As the growth of data and files continues to explode, use of traditional backup techniques becomes more and more difficult, fueling the growth of hierarchical storage management systems (HSM) systems.
But the process is not without some glaring problems. Some of the problems that affect HSM and archiving also affect backup, but archiving systems generally have far more files and data to manage, so the problems are compounded. Standards would help, but the standards process is notoriously difficult.
Data management standards have not changed significantly in a long time. We have a data path that is fragmented among standards bodies, from the OpenGroup and POSIX for file system interfaces, to low-level groups like T11 and T13, which are part of INCITS (InterNational Committee for Information Technology Standards), and IETF (NFSv4 and pNFS and lots of other standards in between).
There are a number of problems I see with archiving interfaces for both the user and administrator:
Archiving was much better in the days of mainframes, and the differences will become increasingly important as we move to petascale and exascale archives.
Interfaces for ILM
The standards process for things like open/read/write, which is maintained by the OpenGroup, is a long and difficult process. There are lots of companies that want to have their say and are adverse to change, given the potential costs. If you are an operating system vendor or file system vendor and someone wants to add a number of new standards that you’ll have to develop code for, you might perceive these changes as high-cost items, and with good reason. Do operating system or file system vendors always have the best interest of the users and administrators in mind? The likely answer is no, of course, since you are looking for a competitive edge for your company, so you are looking for what is in the best interest of your company.
Anyway, what we really need is a standard interface, as part of the standard open statement, that could pass standardized ILM management information to the storage management system. The interface should include information such as:
I am sure there are many policies we could all come up with, and there should be a framework that could pass a structure of policies that everyone could agree on, along with a method for vendor or site-specific functionality. This is not going happen any time soon and will require agreement from a large number of vendors.
Interfaces for Scale
Let’s face it: it is not in the interest of hardware vendors to create standards that reduce the amount of hardware they sell, which could well be a big reason why standards are slow to change.
What is the efficiency of usage of current storage systems? Are we utilizing disk drives at 1 percent, 5 percent, 25 percent, 50 percent or 80 percent of their performance? From the information I have seen, it is often closer to 5 percent. This is especially true for applications that require a high number of IOPS. There are many reasons for this, but one of them is that there is no coordination of resources from the application to the storage device, as I have pointed out time and time again. There should be some changes to support scalability besides the already proposed OpenGroup changes. Here are some suggestions:
We need these changes to allow for things like migration to new systems and vendors — one reason hardware vendors might not support such a change.
Common Commands
Today we have commands for standard interfaces to the file system such as ls –l and find, but if you want to look at file system-specific archive information, you need to use special commands to look at the file systems. These commands do not provide a common data format and a common way of looking at the per file information. If we are going to make all of the other changes to the per file information, then we ought to make changes to the command set to look at the files within the archive system. Why, with all of the databases out there, do we have to access the file data via reading inodes and extended attributes? It is a pretty sad state of affairs that we need to access things this way, given that we have developed databases that support all other types of search operations, but we still read file system metadata an inode and attribute at a time.
Preparing for the Future
I work with a number of sites that have petascale storage systems. Just 10 years ago, 2TB file systems were virtually unheard of and 2GB file size limits were a fact of life. Projecting that same level of scaling means that exascale storage systems are not far down the road. The current commands and standards are limiting hardware scalability and thus require the purchase of large amounts of hardware, given the limitations in scaling of the data path.
Many of the sites I work with openly talk about the need for SSD hardware to support file system metadata. Today SSD often means flash, given the cost and power needs compared with traditional SSDs, which use memory DIMMS. One of the problems with the SSDs is that many file systems are not structured to effectively use this technology, and even if they were, as far as I am concerned the technology’s enterprise reliability is unproven. Sure, it works for your USB drive, but will it work for a huge archive system, and if it doesn’t, what are the implications? Think back to mainframes. In the 1970s, IBM solved many of these problems under MVS. IBM controls the OS and therefore can respond reasonably quickly to market demands.
Flash is not the answer to the problem, since it just puts a Band-Aid on the wound; the data path requires major surgery to be effective, manageable and meet future needs.
Henry Newman, a regular Enterprise Storage Forum contributor, is an industry consultant with 27 years experience in high-performance computing and storage.
See more articles by Henry Newman.
Henry Newman has been a contributor to TechnologyAdvice websites for more than 20 years. His career in high-performance computing, storage and security dates to the early 1980s, when Cray was the name of a supercomputing company rather than an entry in Urban Dictionary. After nearly four decades of architecting IT systems, he recently retired as CTO of a storage company’s Federal group, but he rather quickly lost a bet that he wouldn't be able to stay retired by taking a consulting gig in his first month of retirement.
Enterprise Storage Forum offers practical information on data storage and protection from several different perspectives: hardware, software, on-premises services and cloud services. It also includes storage security and deep looks into various storage technologies, including object storage and modern parallel file systems. ESF is an ideal website for enterprise storage admins, CTOs and storage architects to reference in order to stay informed about the latest products, services and trends in the storage industry.
Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.