Data storage has become the weak link in enterprise applications, and without a concerted effort on the part of storage vendors, the technology is in danger of becoming irrelevant. The I/O stack just isn't keeping pace with advances in silicon, and it could find itself replaced by new technologies like phase change memory(PCM) that promise unfettered access to data.
The problem is simple: Memory bandwidth and CPU performance continue to grow much faster than disk and bus performance and disk channel speed, and combined with a limited I/O interface (POSIX), the result is in an I/O bottleneck that only gets worse with time.
A look at the performance increases for various elements of the storage stack over the last five years paints a clear picture:
- Memory bandwidth: Intel has gone from 4.3 GB/sec in 2004 to 40 GB/sec, while AMD has gone from 5.3 to 25.6, an increase of 9.3 times for Intel and 4.8 times for AMD.
- CPU performance: Using Moore's Law that transistor count doubles every 18 months, I will assume that this translates to performance (which it does not) for a greater than tenfold performance improvement.
- PCIe bushas increased from 250 MB/sec per lane in 2004 to 500 MB/sec per lane in 2008 and an expected 1 GB/sec next year, an increase of two (and soon four) times.
- Per disk channel speedhas increased by 50 percent, from 4Gb Fibre Channel to Gb SAS.
- Disk performancefor SATA has improved from 68 MB/sec to 84 MB/sec, while FC/SAS has gone from 125 MB/sec to 194 MB/sec, a modest improvement similar to per disk channel speed in the case of FC and SAS, while SATA has improved only about 24 percent.
- Disk density has doubled for FC/SAS, from 300 GB to 600 GB, while SATA has seen an eightfold increase from 250 GB to 2 TB.
As you can see, disk channel speed and performance and the PCIe bus have lagged CPU and memory bandwidth performance. What this means is that access to data can't keep up with the ability to process it, and POSIX is unable to determine what's important or define quality of service, problems that could spell doom for storage networking as other options become available.https://o1.qnsr.com/log/p.gif?;n=203;c=204655439;s=10655;x=7936;f=201806121855330;u=j;z=TIMESTAMP;a=20400368;e=i
Access to data is not increasing anywhere in the system, not the PCIe bus, not the storage channel and not the disk drive. The only exception is solid state drives, and specifically SLCSSDs. There was nothing to compare these drives with five years ago, but they are nonetheless still limited by the channel speed, which is currently at 6Gb SAS, or 768 MB/sec for a channel.
POSIX: Part of the Problem
One problem I see is that we have a minimal interface between applications and storage. All we have is standard POSIX control, with open, read, write and close and a few other system calls. There is no method to provide hints about file usage; for example, you might want to have a hint that says the file will be read sequentially, or a hint that a file might be over written. There are lots of possible hints, but there is no standard way of providing file hints, although some file systems have non-standard methods, but the interface is only part of the problem. The real problem is that storage performance is falling further and further behind the performance of the system, making storage harder to use and potentially spelling its demise for all but essential usage. I think that there are lots of reasons for the performance lag, but here are a few:
- Designing PCIe bus interfaces to memory is difficult and expensive and having lots of PCIe buses connected to memory increases the system cost dramatically.
- Since PCIe bus performance is not keeping pace with CPU and memory advances, why do we need faster and faster channel performance?
- Disk drive interfaces are using the same technology as the channel, so the performance is not going to improve much.
This problem has troubled me for a long time. I continually ask myself why the situation hasn't changed, and I keep coming back to the same reason: It is costly and difficult to increase the performance in the storage stack. A change here or there will not change the performance of the whole data path. Simple bottleneck analysis proves this, as changing all but one component means there is still a bottleneck. The whole stack needs to change to improve performance, from the PCIe bus to the device and everything in between, including the software interface. No one company owns the stack. PCIe is a standard, and SAS interfaces and performance is a standard. No one is going to build something that is non-standard, as the engineering costs are just too high and the market is too small.
There are a number of companies building PCIe-based flash storage devices, which eliminate the channel interface problem and some of the performance problems. At least right now, the cost of these devices is far too high for the consumer market, which like it or not drives the storage market. Even with that, the performance bottleneck is the PCIe bus, which has not improved that much and likely will not improve that much in the future. What is needed is something at home that demands more performance.
The one technology that requires bandwidth that far exceeds the PCIe bus is SSDs. The current crop of SSDs can run at over 500 MB/sec for read, which is about 65 percent of a single SAS channel; for a SATA drive it would be a peak of about 105 MB/sec, or about 14 percent of the channel, and for SAS drives a peak of 194 MB/sec or about 25 percent of the channel. The bottom line is it does not take too many disk drives to saturate a SAS channel today. Early in the Fibre Channel days, in 1998, a two FC loop was capable of supporting the I/O for nine disk drives. Today two 6Gbit SAS channels with enterprise drives will support a tad fewer number of SAS drives, so there has been no improvement. And we just got to 6Gbit SAS technology and most RAID systems are still using 3Gbit, which means that you can support about 1.9 drives of today's fastest SAS drives.
Phase Change Memory Could Doom Storage Networks
If vendors cannot find a way to fix the I/O stack — which would involve a lot of groups, such as OpenGroup, ANSI T10, T11 and T13 and PCI SIG — I think the I/O stack might go the way of eight track tapes and floppy disks. Given the difficulty of solving the hardware and interface problem, the situation appears ripe for someone to figure out a workaround. The most obvious solution is to do as little I/O as possible. Right now there is a limitation on the number of memory channels that are available on commodity CPUs and there is also a limit on performance and distance from the CPU for DIMMs. Since NAND flash is not byte addressable, it makes its potential as a memory extension more difficult, but phase change memory (PCM) does not have this limitation. Numonyx (now acquired by Micron) is working on PCM technology with Intel (NASDAQ: INTC).
While years away, PCM has the potential to move data storage and storage networks from the center of data centers to the periphery. I/O would only have to be conducted at the start and end of the day, with data parked in memory while applications are running. In short, disk becomes the new tape.
The I/O stack can be fixed, but there is no immediate financial incentive to fix it now. But if storage vendors are smart enough to see that their entire industry could be threatened a few years down the road by a new technology like PCM, perhaps they can get to work on fixing the problem now, because it will take time.
The ability to put flash and PCM on the motherboard like DIMMs is coming soon. This may not happen in the next few years, but by the end of the new decade, watch out.
Henry Newman, CTO of Instrumental Inc. and a regular Enterprise Storage Forum contributor, is an industry consultant with 28 years experience in high-performance computing and storage.
See more articles by Henry Newman.
Follow Enterprise Storage Forum on Twitter