SSD vs. HDD: Performance and Reliability
The reason that the SSD vs. HDD debate is so critical for the enterprise is the sheer weight of today’s data. Part of the challenge is that this hugely growing data is threatening traditional computing infrastructure based on HDD, or hard disk drive storage.
The problem isn’t simply growth. If that’s all there was to it, then data center administrators would simply slap more spindles, install a tape library, and send secondary data to the cloud where it becomes the provider’s problem. But the problem isn’t just growth; it is also the speed at which applications operate. Processor and networking speeds have kept up with application velocity and growth but production storage has not.
Granted that computing bottlenecks may exist other than in the HDD. Switches fail, bandwidth overloads, VM hosts go down: nothing in the computing path is 100% reliable. But disk drives are the major slowdown culprit in high IO environments. The nature of the mechanical device is the offending party.
Very fast SSD performance is the increasingly popular fix for the problem. However, SSDs are not the automatic choice over HDDs. First, one-to-one, SSD pricing is a good deal more expensive than HDDs. There are certainly factors that narrow the purchasing gap between SSDs and HDDs, and in practice the cost for SSDs can be less. (For a detailed look at HDD and SDD cost comparisons, see Henry Newman’s article SSD vs. HDD Pricing: Seven Myths That Need Correcting) A second factor is what to replace: SSD performance will be faster than disk, but this does not necessarily mean that IT needs this performance level for secondary disk tiers.
A third factor that mitigates against universal replacement is reliability: are SSDs reliable enough to replace HDDS in the data center? In fact, that is a trick question. SSD/HDD reliability depends on many factors: usage, physical environment, application IO, vendor, mean time before failure (MTBF), and more. This is big discussion topic, so to keep this performance/reliability discussion to a useful focus, let’s set some base assumptions:
1. We’ll discuss SSDs in data centers, not in consumer products like desktops or laptops. SSDs have a big place there especially for devices carried into hostile environments. However, the enterprise has a distinct set of requirements for storage based on big application and data growth, and the to-use-or-not-to-use question is critical in these data centers.
2. We’ll limit our discussion to NAND flash memory-based SSDs with the occasional foray into DRAM. This limits the universe of flash technology as the discussion point: DRAM is not a flash technology at all. And in the case of NAND SSDs, remember that while NAND is always flash, flash is not always NAND.
3. We’re not covering other storage flash technologies, which lets out all-flash arrays with ultra-performance flash module components, or server-side flash-based acceleration. These are big stories in and of themselves but do not represent the majority of the SSD market today, particularly in mid-sized business and SMB.
Performance: SSD Wins
Hands down, SSD performance is faster. HDDs have the inescapable overhead of physically scanning disk for reads/writes. Even the fastest 15 RPM HDDs may bottleneck a high-traffic environment. Parallel disk, caching, and lots of extra RAM will certainly help. But eventually the high rate of growth will pull well ahead of the finite ability of HDDs to go faster.
DRAM-based SSD is the faster of the two but NAND is faster than hard drives by a range of 80-87% -- a very narrow range between low-end consumer SSDs and high-end enterprise SSDs. The root of the faster performance lies in how quickly SSDs and HDDs can access and move data: SSDs have no physical tracks or sectors and thus no physical seek limits. The SSD can reach memory addresses much faster than the HDD can move its heads.
The distinction is unavoidable given the nature of IO. In a hard disk array, the storage operating system directs the IO read or write requests to physical disk locations. In response, the platter spins and disk drive heads seek the location to write or read the IO request. Non-contiguous writes multiply the problem and latency is the result.
In contrast, SSDs are the fix to HDDs in high IO environments, particularly in Tier 0, high IO Tier 1 databases, and caching technologies. Since SSDs have no mechanical movement they accelerate IO requests far faster than even the fastest HDD.
Reliability: HDD Scores Points
Performance may be a slam dunk but reliability is not. Granted that SSD’s physical reliability in hostile environments is clearly better than HDDs given their lack of mechanical parts. SSDs will survive extreme cold and heat, drops, and multiple G’s. HDDs… not so much.
However, few data centers will experience rocket liftoffs or sub-freezing temperatures, and SSDs have their own unique stress points and failures. Solid state architecture avoids the same type of hardware failures as the hard drive: there are no heads to misalign or spindles to wear out. But SSDs still have physical components that fail such as transistors and capacitors. Firmware fails too, and wayward electrons can cause real problems. And in the case of a DRAM SSD, the capacitors will quickly fail in a power loss. Unless IT has taken steps to protect stored data, that data is gone.
Wear and tear over time also enters the picture. As an SSD ages its performance slows. The processor must read, modify, erase and write increasing amounts of data. Eventually memory cells wear out. Cheaper consumer TLC is generally relegated to consumer devices and may wear out more quickly because it stores more data on a reduced area. (Thus goes the theory; studies do not always bear it out.)
For example, since the MLC stores multiple bits (electronic charges) per cell instead of SLC’s one bit, you would expect MLC SSDs to have a higher failure rate. (MLC NAND is usually two bits per cell but Samsung has introduced a three-bit MLC.) However, as yet there is no clear result that one-bit-per-cell SLC is more reliable than MLC. Part of the reason may be that newer and denser SSDS, often termed enterprise MLC (eMLC), has more mature controllers and better error checking processes.
So are SSDS more or less reliable than HDDs? It’s hard to say with certainty since HDD and SSD manufacturers may overstate reliability. (There’s a newsflash.) Take HDD vendors and reported disk failure rates. Understandably, HDD vendors are sensitive to disk failure numbers. When they share failure rates at all, they report the lowest possible numbers as the AFR, annualized (verifiable) failure rates. This number is based on the vendor’s verification of failures: i.e., attributable to the disk itself. Not environmental factors, not application interface problems, not controller errors: only the disk drive. Fair enough in a limited sort of way, although IT is only going to care that their drive isn’t working; verified or not. General AFR rates for disk-only failures run between .55% and .90%.
However, what the HDD manufacturers do not report is the number of under-warranty disk replacements each year, or ARR – annualized rates of return. If you substitute these numbers for reported drive failures, you get a different story. We don’t need to know why these warrantied drives failed, only that they did. These rates range much, much higher from about 0.5% to as high as 13.5%.