Enterprise SSDs and Storage Design
Solid State Drives (SSDs) have been around a lot longer than you think. Their current popularity started a few years ago with small-capacity, expensive drives. They had much better performance than spinning drives, and this proved important in a number of areas. They also boasted low power draw and insensitivity to shock (i.e., you could shake them while running and they would not crash or lose data).
However, they were far from perfect. The initial capacities were small, the price per GB was much, much larger than for spinning drives, and they had some peculiarities relative to spinning disks that people had to come to understand. Some of these included the following:
- Endurance limitations (limited number of writes to cells)
- Updating a bit involves an entire block
- Write amplification — more than one write to write a bit of data
- Reading the data could lead to data corruption
In addition, all of the drives were SATA based. Remember that the SATA protocol has a much higher data error rate than SAS (whether on consumer SATA or SATA/SAS nearline enterprise devices). If your file system cannot detect data corruption, then you faced some difficulties with these SSDs.
Earlier SSDs also had some general difficulties. For example, it was difficult for them to sustain a certain level of performance (write or read). Because of the write endurance limitation, manufacturers offered fairly short warranties.
Over time, the manufacturers developed new techniques and technologies to address these limitations, but there was still some apprehension around the drives, particularly in the enterprise world.
Addressing these observations led manufacturers to produce a better SSD that they call the "enterprise SSD." At a high level, the typical benefits of an enterprise SSD include the following:
- Higher performance
- More consistent performance
- Protection of DRAM-stored data in the event of a power loss
- Stronger error correction code (ECC)
- Consistent and persistent quality of service
- Lengthier warranty
- Greater level of endurance
- Greater level of over provisioning
Of course, enterprise SSD drives cost more than a consumer grade SSD, but it's a trade for features you get in the enterprise SSD. Moreover, enterprise SSDs can potentially come in a variety of interfaces including SAS. Note that the definition of an enterprise SSD is not based on the drive interface.
One of the key features of the enterprise SSDs is endurance.
Enterprise SSD Endurance
There are two industry standard bodies for SSDs that have defined what endurance means for enterprise SSD. These are, (1) the Joint Electron Device Engineering Council (JEDEC), and (2) the Storage Networking Industry Association (SNIA). Each of these two organizations have published specifications for endurance (JEDEC) and performance (SNIA) to distinguish between consumer SSDs and enterprise SSDs.
An important consideration in these standards is the data usage models for consumer and enterprise SSDs. Consumer SSDs are tested with consumer applications. This also means they are not tested in a 24/7 scenario since that is very, very uncommon in the consumer world. In contrast, enterprise SSDs are tested with enterprise applications and are tested in a 24/7 environment that one would encounter in a data center.
In JEDEC 218 and 219, consumer and enterprise data usage models are defined. For enterprise SSDs, JESD 218A defines the data usage model as 24 hours per day at 55° C with three months of retention at 40° C. In contrast, in JESD 218A, consumer data usage is defined as eight hours per day at 40° C with one year of retention at 30° C. These data usage models are very different from one another.
Manufacturers have improved the endurance of enterprise SSDs over time using various techniques including wear-leveling algorithms, over-provisioning and self-healing. Over-provisioning is a very common technique used in SSDs that can ultimately help wear leveling and improve endurance. Enterprise SSDs typically reserve a greater percentage of the NAND flash than consumer drives. In turn this allows enterprise SSDs to use lower-endurance NAND Flash options including multilevel cell MLC, 3D NAND and triple-level-cell (TLC). These are lower-cost options that help keep the price down despite the increase in over-provisioning.
Typically you will see SSD endurance described with one of two terms. The first is full drive writes per day (DWPD) for a certain warranty period. If you have a 100 GB SSD with a DWPD specified as one (one full drive write per day) then it can handle 100GB of data being written to it every day for the warranty period. If the DWPD was 10, then it could handle 1TB of data being written to the drive every day for the warranty period.
The second term that describes SSD endurance is terabytes written (TBW). This describes how much data can be written to the drive over the life of the drive. A larger number indicates that the endurance is better than a drive with a smaller TBW number.
Importantly, how the data was written to the drive is not specified for either measure. For example, the testing could be done using streaming data, which is a bit easier to handle than random 4K IOPS. With random write IOPS you are likely to also get reads, some wear leveling and garbage collection IO functions used as part of the testing. With streaming writes, you are less likely to get these additional IO functions. These differences impact the endurance.