SAS vs. SATA - Page 3
In the subsequent sections I'm going to focus on Silent Data Corruption (SDC) of both the SAS and SATA data channel which have an impact on data integrity of each type of connection. Then I'll talk about the impact of T10 DIF/PI and T10-DIX on improving silent data corruption and how they either integrate (or not) with SAS and SATA. Then finally I'll mention the file system ZFS as a way to (possibly) help the situation.
Silent Data Corruption in the Channel
In the SAS vs. SATA argument a key area that is often overlooked is the data channel itself. Channels refer to the connection from the HBA (Host Bus Adapter) to the drive itself. Data travels through these channels from the controller to the drive and back. As with most things electrical, channels have an error rate due to various influences. The interesting aspect of SAS and SATA channels is that these errors result in what is termed Silent Data Corruption (SDC). This means that you don't know when they happen. A bit is flipped and you have no way of detecting it, hence the word "silent."
In general, the standard specification for most channels is 1 bit error in 10E12 bits. That is, for every 10E12 bits transmitted through the channel you will get one data bit corrupted silently (no knowledge of it). This number is referred to as the SDC Rate the number of bits before you encounter a silent error. The larger the SDC Rate, the more data needs to be passed through the channel before encountering an error.
The table below lists the number of SDC's likely to be encountered for a given SDC rate and a given data transfer rate over a year (table is courtesy of Henry Newman from a presentation given at the IEEE MSST 2013 conference).
Table 2: Numbers of Errors as a function of SDC rate and throughput
For example, if the SDC rate is 10E19 and the transfer data rate is 100 GiB/s you will encounter about 2.7 SDC's in a year. The key thing to remember is that these errors are silent - you cannot detect them.
The SATA channel (and the IB channel) has an SDC of about 10E17. If you transfer data at 0.5 GiB/s you will likely encounter 1.4 SDC's in a year. In the case of faster storage with a transfer rate of 10 GiB/s you are likely to encounter 27.1 SDC's in a year (one every 2 weeks). For very high-speed storage systems that use a SATA channel with a transfer data rate of about 1 TiB/s, you could encounter 2,708 SDC's (one every 3.2 hours).
On the other hand, SAS channels have a SDC rate of about 10E21. Running at a speed of 10 GiB/s you probably won't hit any SDC's in a year. Running at 1 TiB/s you are likely to have 0.3 SDC's in a year.
The importance of this table should not be underestimated. A SATA channel encounters many more SDC's compared to a SAS channel. The key word in the abbreviation SDC is "silent." This means you cannot tell when or if the data is corrupted.
Sometimes even an SDC of 10E21 is not enough. We have systems with transfer rates hitting the 1 TiB/s mark pretty regularly and new systems being planned and procured with transfer rates of 10 TiB/s or higher (100 TiB/s is not out of the realm of possibility).
Even with SAS channels, at 10 TiB/s you could likely encounter 2.7 SDC's a year. This may seem like a fairly small number but if data integrity is important to you, then this number is too big. What if the data corruption occurs on someone's genome sequence? All of a sudden they may have a gene mutation that they actually don't and the course of cancer treatment follows a direction that may not actually help the person.
Moreover, what happens if the channel is failing and the bit error rate of the base channel drops to a worse number before it fails? For example, what happens if the channel rate decreases from 10E12 to something smaller, perhaps 10E11.5 or 10E11? At the very least, systems with very high data rates could see a few more SDC's than expected.
There is a committee that is part of the InterNational Committee for Information Technology Standards (INCITS), which in turn reports to the American National Standards Institute (ANSI). This committee, called the "T10 Technical Committee on SCSI Storage Interfaces" or T10 for short, is responsible for the SCSI architecture standards, and most of the SCSI command set standards (used by almost all modern I/O interfaces).