Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Other Data Access
Accessing local data using a large Flash cache instead of a small DRAM cache might make some sense, assuming that my friend Larrys corollary is followed. What works is really dependent upon the size of the data that is being accessed. Almost all devices cache data based on block address, as they are block storage devices in the SAN world. There are some exceptions in the NAS world, but NAS devices arent mind readers and do not know what parts of a file are going to be accessed.
What it really comes down to is whether the range of data that is being accessed exceeds the cache for both read and write. Also realize that some vendor offerings have a separate read and write cache and might use DRAM for writing, given the latency and performance issues with Flash and Flash for read cache. But the issues still hold true. Having a range of read data that exceeds the cache means Flash cache will not help performance very much. If your range of read data is 5x your cache you can expect a 20 percent reduction in latency on average.
The cache hit rate statistics on the device will likely be much higher because cache hits are based on physical reads from the disks and the blocks read from the application. For example, if data is allocated sequentially, the controller reads a full stripe of 1MB and an application reads in 256KB requests, the controller will report three hits and one miss.
Using the same example, if the full stripe read is 2MB for sequentially allocated data, the result would be seven hits and one miss.
Many vendors calculate cache hit rates in this way and they are of little value when trying to gain an understanding of the data reuse. The benefits of using cache in a local data environment will depend on the above factors. My friend Larry Schermers analysis framework from more than 20 years ago still holds true.
Data that is being accessed locally and has latency dependencies on access, such as a database index table, is far more sensitive to the latency of local devices than the latency of the same database 1,000 miles away over a WAN. Since the greatest part of the latency in the remote database case is the WAN not the storage device does cache matter that much? If youre reading and re-reading the same data over and over again for a hot new item, it might help with back-end storage contention, but is it worth the cost of maybe 5x or more per GB of data for Flash? I do not know, but it must be a consideration.
Henry Newman, CEO and CTO of Instrumental Inc. and a regular Enterprise Storage Forum contributor, is an industry consultant with 29 years experience in high-performance computing and storage.
Follow Enterprise Storage Forum on Twitter.