Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Much of the industry seems to have gotten the picture that doing all flash storage is not very practical except for small contained sets such as file system metadata or database index tables, given the cost of flash. Nor is all disk practical except for streaming I/O environments such as video playback or captures.
Of course I am thinking of larger environments with hundreds of TB (or greater) of storage, where storage costs are a big cost driver. Almost every week – if not more often – you hear of another hybrid storage product that will cache your data and seamlessly make it seem like you are running off of one hundred precent flash storage. Of course reality and marketing materials are often different
On the other hand I do believe that the world is going to move to hybrid storage because:
1. Flash costs will not drop below disk costs.https://o1.qnsr.com/log/p.gif?;n=203;c=204650394;s=9477;x=7936;f=201801171506010;u=j;z=TIMESTAMP;a=20392931;e=i
2. Flash write performance is not improving as fast as disk performance.
3. Flash volume cannot replace disk storage without hundreds of billions dollars in investment.
So if you believe that my assumption that data that is regularly accessed (aka hot data), will be migrated from spinning disk to flash and data that is not used very much (aka cold data) will be migrated from flash to spinning disk, what are going to be the challenges for different implementation of flash cached storage?
All of this is very interesting given the latest release on the drop in revenue for flash storage of 6.6%, which means that there is not likely going to be the hundreds of billions of dollars in investment to replace disk drives.
Here are what I think the challenges are:
1. How do you get data to the right tier of storage at the right time?
2. What is the right tier of storage and how do you know it?
3. What are the advantages and disadvantages of the competing designs methodologies?
First I think it is important to cover the two competing design approaches:
1. Caching on the disk drive
2. Caching at the storage controller
Disk Drive Caching Advantage and Disadvantages
Hybrid disk drives have been on the market for less than a year, and the initial design provided only read caching and did not support write caching. But over time I suspect that will change, along with changes to what disk drives have flash cache and the sizes of flash cache.
Having cache at the disk drive allows for the caching system to place the management and controls at the lowest layer. The storage controller does not have to understand the underlying block structure for each disk drive and each tier – the disk drive itself maintains that in its firmware.
Those are great advantages, but the other side of the coin are that things like data deduplication and compression are not likely as useful as a storage controller that breaks things up into blocks and spreads them across the set of disk drives. You are farther removed from the file, and the topology of the information you are accessing is more difficult to understand.
Storage Controller Caching Advantages and Disadvantages
The storage controller has more knowledge of data layout for files than a disk drive does. (It is too bad we never got ANSI T10 OSD working so we could understand the topology of files, but that that is another story.) The storage controller does not have perfect knowledge of data layout as in the POSIX world. Only the file system and volume manager have that knowledge, but it has far better knowledge than a disk drive.
The storage controller can most likely do a better job at deduplication and compression for data and for some data types that makes sense, but for others that are pre-compressed like video it does not. The problem is that the storage controller must understand the performance and topology of all of the tiers of storage and know what is where and why. You might have idle disks, 5400 RPM disks, 10K RPM disks and a variety of tiers of flash. The other thing is that to really do this well the storage controller needs to be very large to control lots of different storage tiers. Large generally means (at least in the past) expensive.
The data challenges are likely what you expect given our POSIX world. Even though there is a movement to object storage, scale out object storage file systems that do anything other than mirroring are not common place. Yes, there are products, but we still mostly live in a POSIX application environment, and though we might want that to change quickly, it will not.
Moving Data to Right Tier
With a POSIX file system, as stated, only the volume manager and file system know which disk drives the files are allocated on. For the most part they are only allocated on sequential block address if there is only one write going on at the same time.
This means that calculation of pre-reading data into cache depends on either a block address being hit at least one time and often multiple times and then moved into flash cache. This is true in a POSIX file system world with attached storage controller figuring out where a file is and moving it. Things get a bit easier with object storage and storage, solutions that know the topology of the data, like the Seagate Kinetic disk drives.
What is the right tier
Finding the right tier is really hard without a priori knowledge. Take the following example with streaming video.
If the data has not been used in a long time it will likely be on the lowest tier of storage in terms of cost and performance. Let’s say you have a 500 MB file that you want to play. If you are only going to read the file one time, why should it be moved from low performance storage to higher performance storage?
On the other hand, what if the application that you opened the file with was video editing software instead of video playback software? If it was an editor, moving the whole file into high performance storage would make great sense. Therein lies or maybe is the problem.
Depending on what applications are being used, the usage of data could be significantly different. And yet you do not know that at the block level or even at the object layer. Figuring out the right tier to move things to might be easy for something like database index tables that are constantly hit or file system metadata when people are doing ls –l. But for data that might be used irregularly then how can anyone know it makes sense to move it? The overhead of moving a file from slow disk to fast flash cache is going to be very costly if it is only read one time or even if the policy is move it if it is read two times.
Moving data around, especially large files that might or might not be read and written in patterns similar to databases or file system metadata, is costly in terms of hardware interconnect design needed for the high speed bandwidth between the tiers. And also costly as you will use the expensive flash cache for data and data accesses that is not really benefiting from using flash.
There are a number of tradeoffs vendors are making. Disk drive vendors are working on understanding access patterns at the drive level for various application workloads to be used in their hybrid drives. Application such as email, web server and alike for a specific class of applications might have access patterns that can be coded into firmware and improve (or likely improve) the performance.
Having flash cache on the disk drive allows data that is reused often to be moved into cache without having to move it off the drive. The challenge in this architecture is that often in tiered storage, besides having slower devices, you have less bandwidth going to each tier. With the storage controller approach you move the data from the slower storage to storage with higher performance. If you do not use the data enough times or have an application that is of critical importance then if the data is not reused enough you just moved data to high cost storage for no reason.
Future Storage Developments
I heard a saying that I really like and use: “Lead, follow or get out of the way.”
The POSIX read/write/open/close interface did not lead and now is following and over the next decade or so is likely going to get out of the way for the most part. We might not like this but this is going to happen. This has an important impact on storage caching, as the biggest problem today with caching is that the topology of a file is not known and cannot be known as it cannot be passed easily through all the layers to where the data resides. Object interfaces will allow an understanding of data topology so that can be passed and maybe even the application usage information could be passed to provide information on usage patterns.
Let’s assume that this happens. This will allow every part of the stack to make better decisions. Disk drives will know what can be cached and what should be cached if a file is opened for read only read/write if the application is streaming writes or IOPS or whatever. I think as we move forward the richness of data that can be passed with an object interfaces is going become available and the vendors up and down the stack will be using them. I do not think this will take long, maybe 5 years.
This will put another nail in the POSIX coffin, which should not be a surprise to anyone. The POSIX file system people at the OpenGroup had the chance to lead, but they chose to follow and now will be made obsolete. This kind of thing happens often in our industry and people seem to never learn from mistakes of the past.
Photo courtesy of Shutterstock.