Preparing for Exascale Archives
As the growth of data and files continues to explode, use of traditional backup techniques becomes more and more difficult, fueling the growth of hierarchical storage management systems (HSM) systems.
But the process is not without some glaring problems. Some of the problems that affect HSM and archiving also affect backup, but archiving systems generally have far more files and data to manage, so the problems are compounded. Standards would help, but the standards process is notoriously difficult.
Data management standards have not changed significantly in a long time. We have a data path that is fragmented among standards bodies, from the OpenGroup and POSIX for file system interfaces, to low-level groups like T11 and T13, which are part of INCITS (InterNational Committee for Information Technology Standards), and IETF (NFSv4 and pNFS and lots of other standards in between).
There are a number of problems I see with archiving interfaces for both the user and administrator:
- User interfaces for information lifecycle management (ILM) are not standardized.
- Interfaces to scale file system metadata are limited by POSIX standards for atomic behavior (see I/O Standards: What Went Wrong).
- Commands such as find and ls –lR are used to process file system metadata for various activities such as file aging issues, accounting and file counts for each file system type.
Archiving was much better in the days of mainframes, and the differences will become increasingly important as we move to petascale and exascale archives.
Interfaces for ILM
The standards process for things like open/read/write, which is maintained by the OpenGroup, is a long and difficult process. There are lots of companies that want to have their say and are adverse to change, given the potential costs. If you are an operating system vendor or file system vendor and someone wants to add a number of new standards that you'll have to develop code for, you might perceive these changes as high-cost items, and with good reason. Do operating system or file system vendors always have the best interest of the users and administrators in mind? The likely answer is no, of course, since you are looking for a competitive edge for your company, so you are looking for what is in the best interest of your company.
Anyway, what we really need is a standard interface, as part of the standard open statement, that could pass standardized ILM management information to the storage management system. The interface should include information such as:
- File retention time: How long do you want to keep this file for? For some files, this might be one year, and for others, 75 years or more.
- Ownership information: Currently we are bound by POSIX user/group ownership control. If a file is to be kept for 75 years, the person who created it is not going to be around the whole time. We need a better way to maintain ownership of a file.
- Performance hints: With any archive system, you have performance needs based on when the file will be accessed. I see at many archive sites that users want to use some files for a few weeks after they are created and then might not use them again for years or never, while other files could immediately go to long-term archive.
- Versioning: Since HSM file systems often do not have versioning, it would be nice on a per file basis to be able to keep a number of versions of a file or allow the file to be replaced. Currently, for most HSM file systems this is done by naming the file a different name, not having the file system maintain versions.
I am sure there are many policies we could all come up with, and there should be a framework that could pass a structure of policies that everyone could agree on, along with a method for vendor or site-specific functionality. This is not going happen any time soon and will require agreement from a large number of vendors.
Interfaces for Scale
Let's face it: it is not in the interest of hardware vendors to create standards that reduce the amount of hardware they sell, which could well be a big reason why standards are slow to change.
What is the efficiency of usage of current storage systems? Are we utilizing disk drives at 1 percent, 5 percent, 25 percent, 50 percent or 80 percent of their performance? From the information I have seen, it is often closer to 5 percent. This is especially true for applications that require a high number of IOPS. There are many reasons for this, but one of them is that there is no coordination of resources from the application to the storage device, as I have pointed out time and time again. There should be some changes to support scalability besides the already proposed OpenGroup changes. Here are some suggestions:
- Coordination of the whole data path: T10 OSD is a good part of this effort, but it does not encompass tapes or SATA disk drives, which is needed for archive systems.
- Support for a common interface for file system metadata: DMAPI from the OpenGroup only supports archiving. There is no common interface to access file metadata that exists today, much less the metadata data that might exist in the future.
We need these changes to allow for things like migration to new systems and vendors — one reason hardware vendors might not support such a change.
Today we have commands for standard interfaces to the file system such as ls –l and find, but if you want to look at file system-specific archive information, you need to use special commands to look at the file systems. These commands do not provide a common data format and a common way of looking at the per file information. If we are going to make all of the other changes to the per file information, then we ought to make changes to the command set to look at the files within the archive system. Why, with all of the databases out there, do we have to access the file data via reading inodes and extended attributes? It is a pretty sad state of affairs that we need to access things this way, given that we have developed databases that support all other types of search operations, but we still read file system metadata an inode and attribute at a time.
Preparing for the Future
I work with a number of sites that have petascale storage systems. Just 10 years ago, 2TB file systems were virtually unheard of and 2GB file size limits were a fact of life. Projecting that same level of scaling means that exascale storage systems are not far down the road. The current commands and standards are limiting hardware scalability and thus require the purchase of large amounts of hardware, given the limitations in scaling of the data path.
Many of the sites I work with openly talk about the need for SSD hardware to support file system metadata. Today SSD often means flash, given the cost and power needs compared with traditional SSDs, which use memory DIMMS. One of the problems with the SSDs is that many file systems are not structured to effectively use this technology, and even if they were, as far as I am concerned the technology's enterprise reliability is unproven. Sure, it works for your USB drive, but will it work for a huge archive system, and if it doesn't, what are the implications? Think back to mainframes. In the 1970s, IBM solved many of these problems under MVS. IBM controls the OS and therefore can respond reasonably quickly to market demands.
Flash is not the answer to the problem, since it just puts a Band-Aid on the wound; the data path requires major surgery to be effective, manageable and meet future needs.
Henry Newman, a regular Enterprise Storage Forum contributor, is an industry consultant with 27 years experience in high-performance computing and storage.
See more articles by Henry Newman.