Real-Time Data Compression's Impact on SSD Throughput Capability
SSDs are one of the hottest technologies in storage. They have great throughput performance (read and write), great IOPS performance with the challenges of a limited number of rewrite cycles, and a much higher price than spinning media. In addition, there are some challenges to make the technology perform well, requiring new techniques to improve the overall behavior. Some of these controller techniques improve performance or longevity, or both. However, these techniques must be "tuned" so the best possible performance is extracted from the technology. One SSD controller company has a technique that improves both the performance and the longevity, but the improvements are solely based on your data.
SandForce is a fairly new SSD controller company. Its products are used in a great number of SSDs, from affordable consumer SSDs to enterprise-class SSDs. The technique SandForce has developed is that its controllers use real-time data compression. They actually compress the data before it is written to the drive enabling the performance can be increased and the longevity of the drive can be improved.
This article examines the concepts of using real-time data compression in SSDs by taking a consumer SandForce-based SSD for a spin. I test the throughput performance using IOzone, which allows me to vary the compressibility (dedupability) of the data so I can see the impact on the throughput performance of the drive. The results are pretty exciting and interesting as we'll see.
Real-Time Data Compression in SSDs
I talked about real-time data compression in SandForce SSD controllers in a different article, but the concepts definitely bear repeating because they are so fascinating.
The basic approach SandForce has taken is to use some of the capabilities from its SSD controller for real-time data compression. Since the implementation is proprietary, I can only speculate on what is going on inside the controller. Most likely, the uncompressed data comes into the drive (the controller) and is likely stored in a buffer. The controller then compresses the data in the buffer in individual chunks or perhaps coalescing the data chunks prior to compression. This process takes some time and computational resources prior to writing it to the storage media.
Once the data is compressed, it is placed on the blocks within the SSD. However, things are not quite this simple. To ensure the drive is reporting the correct amount of data stored, presumably the uncompressed data size is also stored in some sort of metadata format on the drive itself (maybe within the compressed data?). This means that if a data request comes into the controller with a request for the number of data blocks or size of a data block, the correct size is reported.
But since the data has been compressed, the amount of data that is written is less than the uncompressed data. Less data is written to the storage media, which means less time is used, which means faster throughput. The amount of time used to write the data is proportional to the size of the compressed data (i.e., the compressibility of the data), which drives the throughput performance. But we also need to remember that the "latency" for a SandForce controller can be higher than a typical controller because of the time needed to compress the data. I'm sure SandForce has taken this into account so that not too much time is spent compressing the data. In fact, I bet it's a constant time compression algorithm.
During a read operation, the compressed data is probably read into a cache and then uncompressed. After that, it is sent to the operating system as though the data was never compressed. Presumably, the performance also depends on the ability to uncompress the data quickly so the algorithm should have a fixed time.
The really interesting and exciting part of this is that the compressibility of your data influences the performance of the storage media. If your data is very compressible, then your performance can be very, very good. If your data is as compressible as a rock then your performance may not be as good. But before you start thinking, "my data is very incompressible," note that I have seen lots of different data sets (even binary ones) capable of being compressed. Also keep in mind that the SandForce controller needs to compress the data only in a specific chunk, not the entire file. So it's difficult to say a priori with any certainty that your data will not perform well on a SandForce controller. I'm sure SandForce has spent a great deal of time running tests on typical data for the target markets and thus has a good idea of what it can and cannot do. Since the controller is quite popular, I'm sure the efforts have been successful.
However, the fundamental fact remains that the performance of the SSD depends on the compressibility of the data. Thus, I decided to get a consumer SSD with a SandForce 1222 controller and run some IOZone tests with variable levels of dedupability (compressibility) to see how the SSD performs.