Revving Up RAID
What's up with all these new RAID types such as RAID 5EE and RAID 6? What's next, RAID 666, a scheme that guarantees your data will survive Armageddon? And with all this talk about additional hot spares and protection strategies, does that mean that SATA drives are unreliable?
"RAID 6 addresses RAID 5's inability to recover from a second disk failure," says Mike Karp, an analyst at Enterprise Management Associates. "The downside is slower write operations than RAID 5, while the upside is faster recovery after a disk failure."
In a RAID 5 array, data is striped across all drives in the array. Parity information, necessary for data protection purposes, is distributed and stored across all the disks. If one drive fails, the surviving array operates in degraded mode. Once the failed drive is replaced, its data is then slowly rebuilt from the parity information retained on the surviving disks.
But if a second drive fails, game over. And media errors can also result in data being lost. The problems with RAID 5 are further compounded by the size of modern drives. Rebuild times now take much longer, and that opens the door to a badly timed second drive crash.
RAID 5, however, takes up fewer drives and less space. It can be implemented with a minimum of three drives, compared to four for RAID 6. For an easy comparison, let's take a look at a four-drive array at 200 gigabytes per drive. In RAID 5, the total available storage capacity is 600 gigabytes out of 800 gigabytes. RAID 6, on the other hand, only provides 400 gigabytes. Bump it up to an eight-drive SATA RAID array, and 25 percent of the space is consumed for RAID 6 parity, compared to 12.5 percent in RAID 5.
RAID 6 also exerts a severe performance drag on the system around 30 percent in the RAID controller. Companies like AMCC and Overland, therefore, have recently released products to mitigate the RAID 6 performance hit. AMCC's 3ware 9650SE SATA II RAID controller, for example, uses improved algorithms and stripe handling technology that is said to reduce the penalty for RAID 6 write performance down to less than 10 percent.
This follows the pattern established by earlier RAID arrangements. People forget that RAID 1 was the standard about five years back. At that time, RAID 5 had a bad reputation due to performance issues. Fast forward to today and the improvement in technology has minimized those concerns to such a degree that RAID 5 is now broadly deployed.
"Controller performance is one area that we have seen a large improvement in the past five years," says Chip Nickolett, a consultant for Comprehensive Solutions of Brookfield, Wisc.
Today's RAID 5 storage systems have caching and optimization features that eliminate a lot of the performance problems. RAID 6, however, is new, so most storage systems don't handle the performance issues well.
"Earlier RAID 5 systems addressed and handled the initial issues," says Stephen Foskett, principal consultant at GlassHouse Technologies. "RAID 6 will also be optimized, so we will eventually see little or no performance hit."
How about RAID 5E? For the purposes of keeping it simple, RAID 5E is really a RAID 5 scheme with the addition of a hot spare that is being actively used. While this solves some of the RAID 5 issues, it is probably not going to halt a gradual defection of business from RAID 5 to RAID 6 over the next few years.
"It can be difficult to justify having a hot spare in RAID 5E that consumes 10 to 20 percent of the available capacity," says Nickolett.
With so much attention being paid to RAID 6, hot spares and disk failures, does that mean that SATA disks are undependable?
"SATA drives do fail a lot," says Foskett. "It is more common to lose more than one drive in a RAID set than math suggests."
He thinks there are mitigating factors, however. Drive size, for example, is contributing to higher failure rates. Thus, the crashes that do happen tend to have far greater consequences. In addition, OEMs tend to buy disks in batches. They might buy a few thousand of a particular type of drive to fit into a specific model array all of which were manufactured at the same time on the same machine, shipped in the same truck and so on, so problems tend to be repeated in drives being used within the same array.
"A data center with several storage arrays will have a drive failing every few weeks," says Foskett. "RAID 6 gives more reliability."
As a result, Foskett believes it is only a matter of time before everyone who is currently using RAID 5 switches to RAID 6. But that won't happen until the technology matures and the performance remedies offered by AMCC and others have proven themselves in the real world. His estimate is five years for RAID 6 to become dominant. Ten years from now, he believes, RAID 5 will be looked upon as something quaint or antiquated.
"There have often been issues with the stability of new RAID implementations," says Foskett. "Once the bugs in the code of early versions are fixed, the adoption curve tends to get steeper."
Of course, there is the possibility that some other technology might come along and displace or minimize the need for any kind of RAID. All that currently exists, though, are a couple of schemes by the likes of EqualLogic and EMC that implement RAID at the data level, not at the disk level. Although they handle data protection in a way that is not pure RAID, it is essentially a virtual form of the same technology.
"RAID has become so established that it is unlikely to ever disappear in computer systems as we know them today," says Foskett. "But it is becoming less of a system highlight, so it will probably fade from the memory of all but the designers of disk-based systems."
For more storage features, visit Enterprise Storage Forum Special Reports