There is a somewhat famous saying that goes something like, "If you ask five economists their opinion, you will get 14 answers." The general idea is that any pundit can easily give you several opinions about a subject, but economists just seem to do it better than anyone else.
This sentiment is also true in the land of IT, I believe because of the extremely fast pace of technology. Nowhere is this more true than in storage, where things are changing extremely fast, from the perspective of both technology and users and their applications.
When you get more than two storage pundits together, the conversation usually turns to the future of storage. This is true of Henry Newman and myself, when we recently recorded a podcast about our series on Linux file systems. One of the questions we were asked was what kind of file system or storage would we develop if we were kings for a day.
I started thinking more about that question while traveling and decided I'd write a bit about some ideas of where storage is headed, focusing on storage devices and tiering software, since I think the two subjects are connected.
We're discovering that the performance of a much larger number of applications than previously thought are dominated by IOPS (I/O operations per second) performance. In my particular field, high performance computing (HPC), the examination of I/O patterns in applications is showing that small read and write function calls are much more common than previously thought. It's becoming fairly routine to see applications that write Gigabytes or even Terabytes of data to have a large number of 1K and 4K write and read function calls. Consequently, these workloads look increasingly like IOPS-dominated I/O patterns to the OS and the file system.https://o1.qnsr.com/log/p.gif?;n=203;c=204655439;s=10655;x=7936;f=201806121855330;u=j;z=TIMESTAMP;a=20400368;e=i
There has been some work to help operating systems deal with IOPS-dominated workloads. One approach is to use large buffers on the storage servers that allow the OS to combine small read and write functions into larger function calls. This can reduce the effect of IOPS, perhaps making the workload more sequential. However, this can can sometimes require very large buffers to allow read/write requests to be combined. Moreover, this can also result in large latencies because the data for each I/O function is held in the buffer in an attempt to combine them into a single request. There is an extremely fine line between trying to convert IOPS workloads into sequential workloads without adversely increasing the latency to a very high level. The actual outcome greatly depends on the specific application and workload.
We're also discovering that more workloads than thought use random IOPS and not sequential IOPS. Sequential IOPS are desirable workloads because many OSes can take them and combine them into a single request using a relatively small buffer with only a small impact on latency. But there is a measurable impact on the latency, so the resulting required IOPS is smaller than one would expect. But if there are enough adjacent IOPS functions, you can convert the requests into a single, much larger I/O request. However, for random IOPS, there is not much you can do except make extremely large buffers on the storage servers.
Taking these trends and applying them to storage devices, we can see that we'll need devices with more IOPS capability while applications are running. A quick rule of thumb that I use for the IOPS capability of current storage devices is the following:
- 7.2K SATA/SAS drive: 100-125 IOPS
- 15K SATA/SAS drive: 200-300 IOPS
- SATA/SAS SSD: 10,000-100,000 IOPS
- PCIe SSD: 100,000-1,000,000 IOPS
For a first-level approximation, these numbers work for random or sequential IOPS, although some devices have fairly poor random IOPS performance. You can see there is a big difference in IOPS performance between the "normal" spinning disk devices and SSD devices. Between a 7.2K SATA drive and a PCIe SSD there is about three to four orders of magnitude difference in IOPS (a factor of 1,000 to 10,000).
At the same time, there is an order magnitude in price/capacity ($/GB) difference between 7.2K disks and SSDs (about two to three orders of magnitude for a PCIe SSD).
Finally, there is about a factor of 2 to 20 difference in capacity between 7.2K drives and PCI SSDs or SSDs. We now have 3 TB 7.2K SATA/SAS drives. There are some very large capacity SSDs or PCIe SSDs, but these are tremendously expensive. Hence, the "typical" capacity is in the range of 200 GB up to 1TB.
At one end of the spectrum, are large capacity 7.2K drives that have a very appealing price/capacity but very low IOPS. At the other end, are SSDs that have an amazing IOPS capability but a fairly low capacity, and a relatively high price/performance.
Stuck in the middle is the poor 15K drive. Its price/performance is a bit higher than 7.2K drives but lower than that of SSDs. However, its IOPS performance isn't that much better than the 7.2K drive when you take SSDs into consideration.
To me, it appears that the 15K drive is kind of stuck in limbo. I think we'll see 15K drives disappear in the next several years leaving just 7.2K and SSD drives . The 15K drives are a combination of the worst features of spinning drives (low IOPS) and the worst features of SSDs (lower capacity and higher price/performance). It seems much more logical that enterprises will run applications on fast SSD-based storage and then store the final results on 7.2K drives (or maybe even tape).
However, this future is anything but guaranteed. For this to come to pass, we must find good ways to effectively and efficiently move data from the spinning 7.2K drives to the SSDs. I don't think putting solid state storage on spinning disks as extra cache is the right way to do this. However, there are some pretty interesting storage systems that use SSDs as a cache.
The most obvious way to integrate these two types of storage (low-cost, large capacity spinning drives and SSDs) is to use tiering. The key for tiering to be successful is that it has to be able to move data quickly between the two tiers. For example, you could intercept open() calls from an application to start moving the data between tiers. If this is coupled with a list of applications that are to use the fast storage while running, then you get the full benefits of the fast storage (SSDs) without the full cost. For this to happen, the storage system and the tiering must recognize when data needs to be moved and can move it quickly.
Alternatively, before an application runs, the data could be moved from the slow storage to the fast storage. This is typically referred to as "staging" the data. This approach can be used to great effect in the HPC world because applications are typically run using a job scheduler (also called a resource manager) where the user would specify such things as the application, how it is to be run, how many processors are needed, how much memory, and even which files are to be used for input and which files are output files. The user then submits the job to the job scheduler, which decides when and where the application is to be run based on the requirements specified by the user. The job scheduler could then copy the input files from the slow storage to the faster storage in preparation for the application to be run. When the application is finished, the job scheduler can move the files from the faster storage to the slower storage.
The tiering approach is a bit nicer than the job scheduler approach because it happens in the background unbeknownst to the user (automagically, if you will). However, existing tiering software is not really capable of quickly recognizing when data is needed on faster storage and quickly moving it there so it is of limited value today. Much of the tiering software requires the storage study the usage pattern of the blocks of data within a file. After a period of time, it might move them to faster storage based on what is has measured (and perhaps predicted). So, what is really needed is much better tiering software than today's pretty bad option.
This is just a prognostication--I could easily be as wrong as I'm right. But I do think it bears some thought about the direction storage is headed. We're seeing workloads that are much more IOPS based than we previously thought, but at the same time we need lots of capacity. The key point is that we really need only high IOPS performance when an application is running. Consequently, we can make a portion of the storage a fairly small SSD-based system that has tremendous IOPS capability but a fairly small capacity at a reasonable price point. Then, we also create a very large but lower performance storage pool using the huge capacity and great price point offered by 7.2K drives. As a result, I think we'll see 15K drives disappearing from the market during the next few years.
The key to being successful is to marry the two storage pools with middleware that enables the data to be easily moved between the two. Tiering software is probably the best solution, but the current state of tiering is pretty bad. What is needed is tiering software that quickly determines when data must be moved from slower storage to really fast storage and back again. This has to happen in a way that the application performance does not suffer too much because of the data movement.
Current tiering software does not even come close to this, but I have hopes that vendors understand this and are working to fix the problem.
Jeff Layton is the Enterprise Technologist for HPC at Dell, Inc., and a regular writer of all things HPC and storage.