Solving the I/O Problem
Technology as we know it is limited by standards and the bodies that create those standards. One standard limiting the storage industry is the Open Group's POSIX standards, which are pretty old for I/O and do not meet many of today's needs.
Every new standard is agreed upon by members of the Open Group, which includes not only UNIX vendors, but other interested parties, including people like me. The process of creating a standard is a long process and filled with lots of requirements. The process exists for many good reasons. If it was easy, UNIX could easily become a bloated mess with lots of requirements placed on vendors, meaning lots of increased cost and lots of stuff to eat up memory and CPU power, and since software is much more expensive than hardware, simplicity is a virtue.
A few years ago, a group of people who ran large U.S. government high-performance computing sites, interested universities and a few companies involved in this area met to discuss some of the limitations on I/O. The limitations the group was concerned with manifested themselves on shared file systems with large numbers of nodes writing to the same shared file system. Many of these sites used the Lustre file system from ClusterFS or GPFS from IBM, and some of the scaling limitations were caused not by the file system, but by standards that were written 20 or more years ago (see A Business Case for Extensions to the POSIX I/O API for High End, Clustered, and Highly Concurrent Computing).
A group called the High End Computing Extensions Working Group (HECEWG) is part of the Open Group Base Working Group Platform Forum. Another group called the Austin Group is a section within the Open Group for development and maintenance of the POSIX standards in this area.
This group came up with a small set of changes that would allow applications greater control to:
- Open many files on clustered computing system using a shared file system;
- Open one file from many nodes on a clustered computing system with a shared file system;
- Create a list of I/Os that you can send to the file system that eliminate lots of system calls; and
- Allow changes to the stat() call to dramatically speed up finding out information about the files in a file system.
What This Means for You
Most of you probably don't care much about high-performance computing, since the HPC community is very small, so you might think these standards changes won't affect you, but you would be wrong. The HPC community is often the proving ground for new technologies, both software and hardware, and standards. Often the technologies we see in HPC turn up later in the commercial world. We used to say in the early 1990s that the lag time was three to five years, but lately that time has declined to 12-18 months. You only need look at things such as UNIX, Linux, Fibre Channel (1 Gb to 4 Gb), clusters, grids, job management software such as LSF, PBS and NQS, and InfiniBand, to name just a few technologies where HPC led the commercial world. This trend is not going to change anytime soon, since HPC requires the earliest access and development of new technologies to get the job done.
I am pretty sure that many of you are moving to various types of clusters and racks of computers. These are types of equipment we have been using in HPC for a long time, and as these racks of systems have grown to in some cases tens of thousands of processors for a single computer system, so have the I/O problems. These I/O scaling problems might have been tractable with tens or hundreds of nodes, but they are no longer, and adding hardware does not show system scaling. My bet, based on good historical evidence, is that many of you are either starting to hit the same problem the HPC community has faced for a number of years, or you will over the next few years. So it is important to care because we are all going to have the same problems.
A Call to Action
The standards process does not happen overnight. Even if a standard is approved, it takes years before the changes appear throughout the industry. The code has to be written, tested and installed, and in some cases such as changes to the operating system for I/O we then need to have applications take advantage of these new features (such as adding flags to the open system, which is one of the proposed changes). Also required as one of the first parts of a standard's process is the requirement that a reference implementation be designed, implemented and tested. This takes a lot of time and a great deal of resources.
It is in everyone's best interest who is concerned about I/O and scaling to understand the standards process, figure out what is being worked on, and if it meets some of your needs, start telling vendors and then requiring them to incorporate the standards into a release within a specified time period as part of the procurement process. I have seen this done time and time again for both commercial and government procurements around the globe. Just a few years ago, it was not uncommon to see IPv6 as a requirement in procurements, so that machines purchased in 2004 were required to run IPv6 say by 2006. IPv6 was often required for security reasons, but there's no reason we can't require changes for performance reasons.
The changes to standards for I/O are not going to happen overnight, but without these changes we are going to continue to have I/O scaling problems. I don't see any other changes on the application side that are going to help us.
Henry Newman, a regular Enterprise Storage Forum contributor, is an industry consultant with 26 years experience in high-performance computing and storage.
See more articles by Henry Newman.