The concept of putting a cache in front of storage to improve performance is back again. The last time I remember this happening was in the early 2000s, and it failed miserably. Yes it did work for some application areas, but for the most part, the concepts did not work very well for general purpose storage.
This time the idea is back with a few twists. First, the disk drive vendors have gotten into the mix by planning to put flash right on the drive. And the appliance vendors are in the mix with larger flash caches because the cost of NAND flash is much lower than the cost of DRAM, which was used previously.
If you have read my column in the past, you know I have a saying that there are no new engineering problems, just new engineers solving old problems. This is the case for caches, whether they are flash- or DDR-based.
So what are the issues today with the new designs? What is different than it was before? What are the application issues and what is going to work better this time—disk drives or flash appliances?
With these questions, there is one big consideration. File systems allocate space based on a number of factors, such as the volume manager, number of threads for allocation etc. File systems allocations are not necessarily sequential even for a single file, as allocation is often FIFO (first in first out), and with multiple threads writing to different files, the files will likely be interspersed with most file systems. Also, consider that there might be mismatches between allocations, sizes, volume manager stripe sizes, RAID stripes sizes and/or cache stripe sizes depending on the system. So even if you think a file can be cached because the file fits in the cache, it really depends on how everything is allocated in the whole path it might not be correct.
Requirements
As always I look at the requirements and technology second. A close friend of mine used to say cache for storage is for hiding latency or to allow the system to consolidate requests to allow bigger requests for storage. If you cannot meet one or both of these criteria, cache will not work. My friend said this in the 1980s about using SSD cache on the Cray YMP, and the statement still holds true today.
The first questions I always ask people when talking about caching systems include the following:
- Are you reading, writing, both and/or write and then read?
- How big is/are the file(s)?
- How are you accessing the file(s)?
- How many people are working in the file system?
One of the key things to remember is storage is about block and block reuse. Disk drives, cache and the like only know about SCSI commands for blocks of data. They do not know about files at the device layer. Of course, at upper layers there is some knowledge, but once you are at the device layer, it is about blocks and not about files or objects.
Are you reading, writing, both and/or write and then read?
How is the data being accessed? Cache is designed to work well if you are reusing data. If you are not reusing data and there are write streams, it is very difficult for a caching system to know that the write stream is not going to be read. It is even harder if it is read some times and not others. It is easy to read ahead data if it is going to be used, but unless the data is read again, then this does not provide much value over the standard caching using the DDR and or NAND flash included in a common storage controller or RAID card.
How big is/are the file(s)?
If your files are bigger than the cache, there is a good chance that cache is not going to help much. File size is an important consideration when using caches.
How are you accessing the file(s)?
Is data being reused? Understanding the access patterns of the data is critical. If you have 1 TB of cache and 50 TB of files but only use 800 GB of the file data, clearly that will fit in cache. But if the working set is 5 TB, that is not going to work very well.
How many people are working?
I have seen many places where cache works some of the time and does not work others based on the number of users. This should not be a surprise.
Tools
Understanding how much cache is needed really comes down to either throwing darts blind folded or using tools that determine the amount of cache that is needed. Tools are the most difficult thing about this process.
I have seen tools from vendors in this area come and go. Since capacity management for open systems is not—and never has been—considered important considered important for open systems, many people just buy more hardware as that is cheaper than figuring out what is wrong.
The problem is that storage has not been scaling with computation, and just buying more storage hardware no long works in some cases. Oh, go buy an all-flash system is the next answer, but the cost of those systems is very high. I think maybe someone should buy some tools rather than throwing hardware at the problems. I guess I am just naive.
Technology Tradeoffs
Given where the disk drive vendors are going, things are about to get very interesting. Current hybrid disk drives have around 8 GB of cache per drive, but as of yet none of them are enterprise drives. I think that will be changing over the next few quarters based on announcements and market pressures.
Let’s say you have a file system with on hundred 4 TB drives and each of the drives has 16 GB of flash cache. That would be 1600 GB of flash cache or 1.6 TB for your 400 TB of storage. Let assume that you have RAID-6. That drops everything by 20 percent, so you have 320 TB of storage and 1.2TB of flash cache. Now, if we assume that the drive vendors are using similar algorithms to the caching vendors, with 320 disk drives and 1.28 TB of flash cache for the drives, you have 320*120 MiB/sec of bandwidth to and from flash. This yields 37.5 GiB/sec of bandwidth to/from flash.
Think about this: that is the equivalent of about twenty-three 1600 MiB/sec fibre channel connections or twenty-five 12 Gb/sec SAS connections (fifty 6 Gb/sec SAS connections). I think you see the points.
Challenges
The big problem I see is that all these SSD flash caches are going to have nowhere near the bandwidth that the hybrid disk drives will have, unless you put a lot of money into bandwidth for connecting these types of external caching devices.
I think that the disk drive makers understand this, and the external caching marketing will be short-lived again. This is not to say that if you have caching requirements and your applications and users can benefit from them that you should not buy these technologies. But I am suggesting that over the next few years, external caching devices will likely be replaced with caching disk drives—if the market demand for these drives goes the way I expect them to go. But remember I have been wrong before.