SAS vs. SATA - Page 2
If there is a second hot-spare drive then the RAID group starts rebuilding from the remaining eight drives utilizing the two hot spare drives. To finish the rebuild, 32TB of data needs to be read (8 x 4TB). But again, after reading about 10TB of data there is nearly a 100% probability of hitting a hard error (32TB of data needs to be read). Therefore the second rebuild is very likely to fail.
Now three drives in the RAID group have been lost and no data can be recovered from the RAID group. At this point the data needs to be restored from a backup. But hopefully not one based on Consumer SATA drives, because of the need to restore 40TB of data – and it is likely to hit a hard error on the restore because 40TB of data has to be read. In fact you are likely not to be able to recover the data from a backup or copy that has been made with Consumer SATA drives because of the exact same scenario I laid out.
In the case of SATA/SAS Nearline Enterprise drives instead of about a 100% probability of hitting a hard drive error at about 10TB of data, it happens at 100TB of data (a factor of 10 improvement over Consumer SATA drives). In the case of SAS drives, the probability of a hard drive error is about 100% after 1.11PB of data has been read. RAID groups of 1.11PB in size are not that common at this point but don't blink.
It can be confusing to ascertain the hard error rate for the drive you have, so you have to read the specifications carefully looking for the hard error rate. If the rate is not given, chances are that it's not very good (just my opinion). Depending upon the hard error rate and the drive capacity, you can then decide the size of the RAID groups you can construct without getting to close to the 100% probability point.
I have heard many stories of people running Consumer SATA storage arrays that are almost always undergoing constant rebuilds with a corresponding decrease in performance. These designs did not take into account the hard error rate of the drives.
One additional thing to note is that as drives get larger their rotational speed has remained fairly constant. Therefore the rebuild times have greatly increased and the needed CPU resources have greatly increased as well (computing p and q parity for RAID-6, etc.).
With the current capacity of drives the time for a single drive rebuild can be measured in terms of days and is driven primarily by the speed of a single drive. During the rebuild time you are vulnerable to a failure of some type and this could result in the loss of the RAID group (i.e. shorter rebuild times are better). Moreover, during the rebuild period the performance is also degraded because the drives are being read to rebuild the RAID group. Therefore the back-end performance will suffer to some noticeable degree.
Here is a second point that might have escaped notice. The discussion to this point has been about SAS drives vs. SATA drives. While I used hard drives in the discussions, everything I have said to this point applies to SSD's as well. This is extremely important to note because there are storage solutions where SATA SSD's act as a cache for hard drives (or a tier of storage). These drives have the same hard error rate as their spinning cousins. After about 10 TB of Consumer SATA SSD's, you approach a 100% probability of hitting a hard error causing a RAID rebuild. Please look for the hard error rate on the SSD drive. Here is an example.
From this section we learned that Consumer SATA drives can have a hard error rate that is 10x to 100x greater than drives that are either "Nearline Enterprise SATA/SAS" or "Enterprise SAS drives."
The next topic in the SAS vs. SATA debate I want to discuss is data integrity. How important is your data to you? Do you have photos of your kids when they were younger that you want to keep around for a very long time so they look like you just took them? Perhaps a better way to state the question is: "Can you tolerate losing part of the picture or having the picture become corrupt?"
Now ask a similar question about your "enterprise data." How important is that data to your business? Can you tolerate any data corruption? When I worked for a major aerospace company they were required to keep engineering data on the type of aircraft if there were any of the aircraft still flying. When I left they had still stored data from the 1940's.
Now think of aircraft being designed today. It is likely that some of them will be around for 100 years or more. For example, the B-52, which entered service in 1955, is still in active service and will likely continue to fly for many more years into the 2040's, at which time it will be almost 100 years old!
The records from the 1940's and 1950's are all paper documents but today's aircraft are designed with very little paper and are almost 100% digital. Making sure the digital data is still the same when you access it in 80 years is very important. That is, data integrity is very important and may be the most important aspect of data management a company can address.