Fixing SSD Performance Degradation, Part 1 Page 2
Another technique for boosting SSD performance is to keep a certain number of blocks in reserve without exposing them to the OS. For example, if the SSD has a total of 75GB of total space, perhaps only 65GB of it will be exposed to the OS. These reserved blocks can be used for the general block pool to help performance without the OS knowing. These reserved pages increase the size of the block pool virtually guaranteeing that the pool will never run out of available empty blocks. This would let the write cycle just write, instead of read-modify-erase-write. At the very least it becomes read-modify-write and the "old" blocks are flagged for erasure outside of the write cycle. In contrast, if there were no empty blocks available then the controller would have to use the read-modify-erase-write process.
This simple concept is called over-provisioning. It has benefits for both SSD performance and longevity. In the case of longevity, if a particular block within the SSD has been used more than other blocks (i.e. higher number of write/erase cycles), then it can be switched with a block from the reserved pool that has much less usage. This helps with overall wear leveling of the SSD. On the downside, over provisioning means that you don't get to use all of the space on your SSD.
Another long-awaited technique is something called a TRIM command. Recall that one of the big performance problems comes when a write is performed to a page that has not been erased. The entire block that contains that page has to be read into cache (read), the new data is then merged with the existing data in the block (modify), the original block on the SSD is erased (erase), and finally the new block in cache is written to the block (write). This read-modify-erase-write process takes much more time than just a write would on the SSD. The TRIM command tells the SSD controller when a page is no longer needed so that it can be flagged for erasing. Then the SSD controller can write the new data to a "clean" page on a block so that the entire read-modify-erase-write cycle is avoided (the cycle just becomes "write"). Thus, the write performance is improved. Without TRIM, the SSD controller does not know when a page can be erased. The only indication that the controller has is when it writes a modified block to a clean block from the block pool. It then knows that the “old” block has pages that can be erased. In essence, TRIM is giving “hints” to the controller about the status of the data that it can use to improve performance and longevity.
However, the TRIM command has issues of its own. The first issue is that the SSD controller needs to erase the flagged pages (i.e. garbage collection). Hopefully there is enough time, capability, and cache for the SSD controller to go through the read-modify-erase-write cycle on the flagged blocks. The SSD controller will typically start with the blocks with the largest number of flagged pages. It might even take the used pages from these blocks and put them on other blocks, allowing the entire block to be just erased (i.e. no modify-write steps). This process can use a number of blocks in the block pool. The second issue is that as the capacity on the SSD is used there are fewer free blocks in the block pool. This can put more pressure on the SSD controller since the TRIM command will start flagging pages within the reduced set of blocks, resulting in fewer totally clean blocks. Consequently, you are more likely to encounter a read-modify-erase-write cycle while writing data (something the TRIM is designed to help alleviate) as free space is reduced.
The third issue, which is really related to all of the above, is that the SSD controller needs to have enough horsepower, time, and cache, to start the process of garbage collection. The likelihood that there will be blocks with pages flagged by the TRIM command increases if the SSD is being heavily used. The probability of encountering a read-modify-erase-write cycle increases with the associated reduction in write performance if the controller doesn't have time to perform garbage collection. Alternatively, the designers of the SSD controller can insert logic that forces the SSD controller to perform garbage collection at certain times but the result is the same: a reduction in write performance.
To make TRIM work effectively, the file system has to understand when pages are deleted and when to send an appropriate TRIM command. The OS must be able to send the TRIM command to the drive controller and the drive controller has to understand the command and act accordingly. TRIM appeared in Windows before Linux, but the more recent kernel versions understand the TRIM command and can pass it to the drive controller. Specifically, any kernel from 2.6.33 and up understands the TRIM command. Many file systems also understand the TRIM command. For example, ext4 understands TRIM as well as btrfs (since 2.6.32). Other file systems are gaining TRIM capability as well.
Like write combining and over-provisioning, TRIM is not a cure-all for write performance issues. The controller may have difficulty keeping up with TRIM commands if you push enough data to the SSD. If you modify an existing file without first erasing it, you are still likely to run into the read-modify-erase-write performance problem – and TRIM can't help.
SSD write performance degradation over time is rooted in the read-modify-erase-write cycle because, fundamentally, writes happen on a page level and erases happen on a block basis. The result is the write amplification factor, where a write amplification factor of 1 means that the SSD writes exactly the amount of data the application requests. Write amplification factors greater than 1 necessitate that the SSD perform some “housekeeping” so that application data can be written. This extra housekeeping can slow write performance and reduce the longevity of an SSD.
On the bright side, SSD engineers and designers have been working on techniques to fix these problems for some time. Write combining, over-provisioning, and the TRIM command, are all techniques that can help reduce the impact of the read-modify-erase-write cycle on performance. However, there are conditions under which these techniques may not help as much as we would like.
In addition, a file system can become fragmented as it ages and that fragmentation can trickle down to underlying SSDs, leading to reduced performance, but there’s not much engineers or designers can do to fix that problem.
Part two of this article series will present some benchmark results to examine the impact of age on SSD performance. We'll take a brand-spanking new n Intel X25-E SSD (enterprise class) and run some benchmarks against it and then we'll torture the poor SSD with more tests and rerun the benchmarks to see how well the performance holds up. The results are really interesting.
Jeff Layton is the Enterprise Technologist for HPC at Dell, Inc., and a regular writer of all things HPC and storage.
Follow Enterprise Storage Forum on Twitter.