SAS vs. SATA - Page 4
The T10 organization has created a new standard called T10 PI/DIF (PI = Protection of Information Disk, DIF = Data Integrity Field disk). They are really two standards but I tend to combine them since one without the other doesn't achieve either goal (in the rest of the article I'll just use T10-DIF to basically refer to both). This standard attempts to address data integrity via hardware and software. The T10-DIF standard adds three fields (CRC's) to a standard disk sector. A diagram for a 512-byte disk sector with the additional fields is shown below:
Figure 1: T10-DIF data layout
In a 512-byte sector (shown as 0 to 511 byte count in the previous figure), there are three additional fields, resulting in a 520-byte chunk (not a power of two). The first additional field is a 2-byte data guard that is a CRC of the 512-byte sector. This field is abbreviated as GRD. The second field is a 2-byte application tag, abbreviated as APP. The third field is a 4-byte reference field, abbreviated as REF. With T10-DIF the HBA computes a 2-byte guard CRC (GRD) that is added to the 512-byte data sector before the entire 520-byte sector is passed to the drive. Then the drive can check the GRD CRC against the CRC of the 512-byte sector to check for an error. This greatly reduces the possibility of silent data corruption from the HBA to the disk drive.
T10-DIF means that drives have to be able to handle 520 byte sectors (not just 512 bytes). Moreover, if the drives have 4096-byte sectors then theT10-DIF standard changes to 4,104 byte sectors. There are HBA and disk manufacturers that support T10-DIF drives today. For example, LSI has HBA's that are T10-DIF compliant and there T10-DIF compliant drives from several manufacturers.
A visual way of thinking about the protection it provides is shown in the figure below:
Figure 2: T10 protection in the data path
At the bottom of the figure is a representation of the data path from the application down to the disk drive including the possibility of a storage network (SAN) being in the path (this can be a SAS network as well). The first row above the data components, labeled as "Normal IO," illustrates the data integrity checks that vendors have implemented. They are not coordinated and are not tied to each other across components, leaving the possibility of silent data corruption between steps in the data path.
The next level above, labeled on the left as "T10-DIF," illustrates where T10-DIF enters the picture. The HBA computes the checksum and puts it in the GRD field of the 520-byte sector. Then this is passed all the way down to the drive, which can check that the data is correct by checking the GRD checksum against the computed checksum of the data. T10-DIF introduces some data integrity protection from the HBA to the disk drive.
If you go back to the table of SDC's you will find that when using T10-DIF with a SAS channel, the SDC increases to 10E28. For a storage system running at 100 TiB/s for an entire year it is not likely that a single SDC will be encountered in the SAS channel. Just in case you didn't notice, a storage system running at 100 TiB/s for a year is roughly 3,153,600,00 TiB of total throughput (roughly 31,536,000 Exabytes).
Using T10-DIF added 7 orders of magnitude to the SDC protection of a SAS channel (10E21 to 10E28). SAS + T10-DIF is now 11 orders of magnitude better than the SATA channel (10E28 versus 10E17).
There is a second standard named T10-DIX that computes a checksum from the application and is inserted into the 520-byte chunk (the APP field). The T10-DIX protection is show in the previous diagram on the line with the left-hand label of "T10-DIX." It can be used to check data integrity all the way down to the HBA where T10-DIF kicks in.
If you look at the previous diagram, the application, or something close to it, creates a 2-byte CRC (tag) and puts it in the 520-byte data chunk. This allows the data to be checked all the way to the HBA. Then the HBA can do a checksum and add it to the correct place in the T10-DIF field and send it down to the drive.
If you look at the very top line of the previous diagram, labeled on the left as "T10-DIF+DIX," you can see how the T10 additions ensure data integrity from the application to the drive. This is precisely what is needed if data integrity means anything to you.
There are a few things of note with T10-DIF (PI) and T10-DIX. The first is that the application field cannot be passed through the VFS layer without changes to POSIX. Secondly, in the case of NFS the ability to pass 520-bute sectors is not likely happen either, unless the underlying protocol is changed (and POSIX). That means that NFS is not a good protocol at this time if you want data integrity from the application down to the HBA (T10-DIX) from the host.
If you have carefully read the T10 discussion you will notice that it is all around SAS. T10-DIF/PI and T10-DIX cannot be implemented with SATA. Therefore the SAS channel with T10-DIF just widens the gap with the SATA channel in terms of SDC.
Notice that the data integrity discussion is about channels and not drives. If any storage device uses a SATA channel then it suffers from the SDC rate previous discussed. If the storage device uses a SAS channel then it has the same SDC and can use T10-DIF/PI and T10-DIX to improve the SDC to a very high level. To be crystal clear, this includes SSD devices. SATA SSD devices that use the SATA Channel have a very poor SDC relative to SSD's that use the SAS Channels (SAS attached SSD devices).
There are many storage systems that use a caching tier of SSD's. The concept is to use very fast SSD's in front of a large capacity of drives. The classic approach is to use SAS drives for the "backing store" in the caching system and use the SSD's as the write and read cache layer. Many of these solutions use SAS channels on the backing storage, resulting in a reasonable SDC rate, but use SSD's with a SATA channel in the caching layer.
Some of these caching layers can run at 10 GiB/s. With a SATA channel you are likely to get 27.1 SDC's per year (about one every two weeks) while the SAS channel devices have virtually no SDC's in a year. The resulting storage solution only has as much data integrity as the weakest link - in this case, the SATA Channel. With this concept you have taken a SAS Channel based storage layer with a very good SDC rate and couple it to a SATA channel based caching layer with an extremely poor SDC rate.
Can't ZFS cure SATA's ills?
One question you may have at this point is if a file system like ZFS could allow you to use drives attached via a SATA channel despite their limitations? ZFS focuses on data integrity and checksums the data so shouldn't it "fix" some of SATA channel problems? Let's take a look at this since the devil is always in the details.