Storage I/O and the Laws of Physics
The title might seem merely catchy, but when it comes to managing storage, simple physics define the limitations under which you'll have to work. Data's movement from applications to hardware devices is limited by physical constraints within the computer and its storage hardware.
First, let's compare the fastest computers and the fastest storage disk storage devices from 1976 and today to get a better understanding of the changes we've seen over the last 26 years.
|Year |CPU |Disk |Disk |Disk Seek |Transfer | | |Performance*| |Size |&Latency |Rate | |-------------------------------------------------------------------| |1976 |CDC 7600 25 |Cyber 819 |80 MB |24 ms |3 MB/sec | | |MFLOPS** | | | |half | | | | | | |duplex | |2002 |NEC Earth |Seagate |146 GB |7.94 ms |200 MB/sec| | |Simulator 40|Cheetah | |**** |full | | |TFLOPS*** |X10.6 | | |duplex | | | | | | |***** |* Though this might not be the best measure of throughput it is a good comparison
** Million Floating Point Operations Per Second
*** Trillion Floating Point Operations Per Second
**** Average seek and latency for read and write
***** Using FC RAID with 2 Gb interfaces and RAID-5 8+1 in
CPU and storage improvements over the last 26 years have been:
System computation performance: 1,538,461 times
Single Disk density: 1825 times
RAID LUN density: 14,600 (8+1 RAID-5 with 146 GB drives)
Seek+Latency ~3 times
Transfer rate 133 times
Improvements for seek time and latency have been small compared to system CPU performance increases because disks are mechanical devices. Storage density has not kept pace with increases in system CPU performance either, and is more than two orders of magnitude less, even when using RAID-5 8+1 and the latest 2 gigabit hardware. This is, of course, the upper end of the high performance market, but the issues still apply with varying degrees to every market segment from the desktop to the enterprise.
Seek and rotational latency times have not changed very much and are unlikely to change in a significant way with current technology. Seek time is dependent on the size of the device, and even though disk drives are not the size of a very large washing machine today, they have not shrunk as fast as system CPU performance has increased. Rotational latency has improved incrementally during this same time also. As the devices get smaller they can spin faster than the 3600 RPMs we saw in 1976. If disk drive rotational performance had increased a just one-tenth the rate of system CPU performance, the current disk would spin at over 553 million times per minute or over 9 million times per second!
The most common bus interface between the computer memory system and storage hardware today is usually PCI, which runs at a peak of 532 MB/sec. PCI-X, which is two times faster than PCI, is starting to become common, but even these interfaces have not come close to scaling with the increases in system CPU performance during this time period.
It is impossible with today's technology, given material sciences for the disk platters, bearing technology, heat, packaging, electricity and many other factors of varying comprehsibility, to radically change these storage trends using disks as we know them today. The performance differences are not going to change without major technology shifts. What makes these imbalances important is that they affect device efficiency and how applications need to be architected. In 1976, the transfer rate was slower as a function of the seek and latency time. Therefore smaller I/O requests could more efficiently use the available bandwidth of the device.
The following chart on device utilization makes these assumptions:
- each I/O is followed by an average seek and average latency
- no caching is assumed
- for today's devices, 200 MB/sec is assumed using RAID for both 10K and 15K disks
|Record |% Utilization|Today % |Today % | |Size |1976 |Utilization 10K |Utilization 15K | |1024 |1.34% |0.06% |0.08% | |4096 |5.15% |0.24% |0.32% | |8192 |9.79% |0.47% |0.65% | |16384 |17.83% |0.94% |1.29% | |65536 |46.47% |3.68% |4.95% | |131072 |63.45% |7.09% |9.43% | |262144 |77.64% |13.24% |17.24% | |524288 |87.41% |23.39% |29.41% | |1048576 |93.28% |37.91% |45.45% | |2097152 |96.53% |54.98% |62.50% | |4194304 |98.23% |70.95% |76.92% | |8388608 |99.11% |83.00% |86.96% | |16777216 |99.55% |90.71% |93.02% |
Though the above is an example of a near worst case scenario, what is clear today is that high device utilization requires large I/O requests. Some will argue that almost all of these performance degradations are mitigated by caching RAID devices, on-disk cache and other techniques. I believe in a few cases this might be true. What I have found is that in the vast majority of environments the truth is somewhere in between. What is important is that for the foreseeable future, the trend will not change, unless you plan to buy solid state storage devices (SSDs) for all of your storage at a cost of well over 100 times the cost of rotating storage. Each day on every system you will face these performance issues that require you to make large requests to achieve high device utilization.
Clearly, you cannot change most of your applications to make large requests to achieve a high percentage of utilization. Changes to Oracle and other applications to allow and implement this are next to impossible. In some cases you can change file system and volume manager tunables to improve the caching of the data and increase the transfer sizes, but this is not always the case. So architecting systems to achieve the level of service required become more difficult given the:
- The application request size
- The application data access pattern
- Randomly sequential (this is a term used to describe I/O that is sequential for a number of requests and then does a seek and is then sequential again)
- Size of the RAID cache and tunable parameters for readahead, read and write cache sizes, high water marks and the other applications using the RAID cache
- size of the file system and volume manager caches and tunable parameters
- Memory size of the machine
- Number of HBAs
- Number of other application using memory as this effects on most operating systems how much memory can be used for file system cache
I am sure if I spent more time thinking about this problem I could come up with a larger list. Clearly, architecture of a system to achieve understandable I/O throughput has also grown in complexity over the last 26 years. We have had a few areas of improvement over this time period and these are our saving graces:
- Cost for each channel to the device has dropped dramatically
- Cost for each disk drive has also dropped dramatically
- The number of devices that can be connected to machines has increased with RAID and JBOD to allow for a much larger number of head seeks that can be active during a single time
- Command queues for the system have increase allowing commands to be sorted by seek distance
These saving graces allow us to be able to get our jobs done, by allowing us to buy more drives, have more channels and more commands doing head seeks on more drives than was possible in 1976. With each of these features comes a price, and that price is the architectural complexity of the system.
The questions usually asked are:
- How many disks/RAIDs are needed and of what type?
- How many channels are needed?
- What are the configuration, tunable options, and settings for the system?
- In some cases the question of what server, file system and volume manager are to be used, since they can be changed.
With each question the answer might require asking many questions of the users, database architects, and other applications engineers, until you have a complete understanding of how the system will be used, how things like databases will be constructed and accessed, and the available hardware and software.
Answers to all of these questions are actually reasonably obvious if you understand how the hardware, system software (operating system, file system and volume manager) and applications software work together. This is often called the data path and includes:
- The path to/from the application to the operating system
- To/from the file system and/or volume manager
- To/from the device drivers and host bus adapters (HBAs, ISCSI, NICs)
- To/from the storage hardware (disk, RAID, Tape and might include fibre switches and/or SCSI over IP equipment)
At each of these stops you have different options, and tradeoffs. Sometimes the how a vendor implements a file system can determine the difference between meeting the expectations and failure. Add to this tradeoffs between locally attached file systems, NAS, and SAN shared file systems, and we will have a great deal of cover over the coming months.
This is the first in a series of articles that will discuss the I/O data path, which is the path from the application to the storage device. This will eventually include everything from user applications, system applications, operating system, file systems, HBAs, switches, RAIDs, tapes and the issues and process of architecting storage.