The idea for this article came out of a talk I was asked to give on why the I/O stack we have today is not well integrated (see Solving the I/O Problem). It got me thinking about how we got to where we are with the current I/O stack, which I think looks like this:
If you look at things from a standards perspective, there are many groups involved, and in some cases even multiple groups involved at each level. The one exception is for the applications level, where there is no real standard for communication to the file system or operating system except standard system calls and the C library buffered I/O. If you have read A Trip Down the Data Path, you know I think there is pretty weak communication and there is a lot more that could help. Of course, there is limited communication between file systems and block storage devices, so we have a long way to go (see Let’s Bid Adieu to Block Storage and SCSI.
The question that kept nagging me is how did we get to a point where the various levels of storage did not talk to each other in an integrated way? Sure, each level communicates to its nearest neighbor, but where is the integration?
Think of all of the standards bodies involved: T10 (SCSI), T13 (ATA Storage Interface), T11 (device interfaces today, mostly Fibre Channel), the OpenGroup (UNIX historically and POSIX), SNIA, and IETF (the Internet), just to name a few.
The Evolution of Standards Efforts
I looked back at what I thought was the first set of very successful standards for computers that became commonplace. In my opinion, the standards for ARPNET and the IP stack were the first successful attempt to create standards. Those standards were not created by a bunch of companies huddle in a room together, but by researchers working under contract for ARPA (Advanced Research Projects Agency), which today is known as DARPA (Defense Advanced Research Projects Agency). What we have today is totally different from when the whole standards process started. Today, many if not most of the standards are run by companies. The exceptions from what I have seen are still the Internet and the IETF, which still has a great deal of input from the research community, and to some degree the OpenGroup, which manages the UNIX standards. So what is missing? I come up with a couple of things.
- We do not have standards groups that address application communication to the file system. Will the I/O be random, sequential, patterns (say a skip increment or reading backwards), small block or big block? With each of these items, the file system and lower levels of software could make some better decisions on readahead, caching and a number of other things. A few vendors have libraries, but there is no common interface. MPI I/O has some ideas, but these need to be expanded and standardized.
- No group is looking at the big picture. Of all of these groups, who is looking at the problem from the application to the storage device and determining what is missing? Right now, the answer is no one.
A number of these groups have some input from the research community. For example, the T10 group had input for Object Storage Devices (OSD) from some people at Carnegie Mellon University, but as the research turned into a standard, things moved out of research and into commercial companies doing implementation. This was not really true for TCP/IP and some of the other ARPA developments. The reason could be that when TCP/IP and the other standards were being developed, in order to communicate between the various research groups at the ARPA-funded universities, the government provided more money for research than it does today for computers, perhaps because they were not as widely used then.
Also at the time, companies such as AT&T were cash flush and could afford to fund the development of UNIX and C. IBM could have afforded to fund open development as AT&T did, but at the time it was not in the interest of their stockholders, while it was in the interest of AT&T at the time. At the same time, the U.S. Department of Energy was funding the development of interactive operating systems such as LTSS (Livermore Time Sharing System) and derivations and follow-on operating systems. These systems were not available publicly in the early years and were not free to universities to download. If they had been, UNIX might not have been developed or caught on the way it did and the world might be a different place. Even those operating systems that were based on LTSS are now a distant memory.
These examples show the changes in the standards process. I believe that the reason standards do not look at the end-to-end situation for I/O is that the whole process has been hijacked by the industry and the process now reflects the interests of the vendor community. Admittedly a strong statement, but the history of standards development appears to bear that out.
Storage vendors or storage groups in large companies have acted independently for at least 30 years. From what I have been able to glean, the spilt started to happen about 35 years ago. You now have a clear delineation of vendors based on standards.
- Operating and file system vendors follow standards from the OpenGroup, which provides standards for how applications interface with both.
- T10,T11 and T13 all have standards for drivers and communication devices to storage.
- SNIA tries to get the vendors to play nicely with each other and develop common metrics and management for the various technologies.
- IETF deals with NFS and networking standards.
In my opinion, you will not get these various groups to look at the problem in an integrated way because it is not in their corporate best interest to do so. For example, why should a RAID company send people to a costly meeting to work on costly changes for their RAID products so that they can communicate throughout the data path? If I were running the company, I would ask how this would benefit the bottom line. It’s a business, not a charity, after all.
What Can Be Done?
We have all seen the movie “Back to the Future.” I think, in a nutshell, that that is exactly what is needed. The user and research communities need to take the standard process back from industry, and the people to fund this are one of the largest consumers of computational cycles: the U.S. government (and, in fact, all governments).
From what I have seen over the last 10 years, the cost of storage as a percentage of the money spent on computer equipment is going up. On one procurement I work on regularly, the cost of storage in 2001 was 9 percent, while to today it is more than 25 percent of every dollar spent on a new system. That is 18.6 percent compound annual growth! In my opinion, one of the biggest reasons for this increase is the fact that storage performance does not scale, and that some of the reasons it does not scale is the lack of communication between the application and the storage device. Of course, the fact that storage performance is not increasing as fast as CPU performance is surely not helping the situation at all (see Storage Headed for Trouble).
If users, researchers and others do not get involved, we are destined to continue to have standards fragmentation. The standards process is open to all; we can either continue moaning, or we can do something about changing the standards. As I mentioned recently, there is a group, including the U.S. government and research community, that is trying to make some additions to the UNIX standard via the OpenGroup. This is a great start, but they need our help. If we don’t get involved, the situation can’t change.
Henry Newman, a regular Enterprise Storage Forum contributor, is an industry consultant with 26 years experience in high-performance computing and storage.
See more articles by Henry Newman.