Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
One of the challenges facing RAID storage technology is the growing time needed to rebuild failed disks, which increases the risk of data loss and threatens the long-running data storage technology's viability (see RAID's Days May Be Numbered).
With the growth in disk density far outstripping performance and reliability improvements, some in the storage networkingindustry wonder if RAID will be able to recover failed disks fast enough to remain relevant in the enterprise data storage world.
But a few data storage vendors have turned to an old idea to keep RAID relevant: Declustered RAID, or parity declustering, a concept that was devised by RAID pioneer Garth Gibson and Mark Holland in a 1992 paper.
"What I find interesting is that some of the issues facing RAID today are similar to those of a decade ago," said Greg Schulz, founder and senior analyst at the StorageIO Group.https://o1.qnsr.com/log/p.gif?;n=203;c=204650394;s=9477;x=7936;f=201801171506010;u=j;z=TIMESTAMP;a=20392931;e=i
Schulz said those issues include large-capacity disk-drive rebuild, distributed data protection, availability, performance and ease of use.
Declustered RAID Defined
Declustering was first proposed for mirrored disks and later developed in Gibson's lab at Carnegie Mellon University for RAID level 5 arrays.
"Declustering replaces one RAID controller for every little cluster of disks with a collection of RAID controllers, each able to access all disks, probably through a FC or iSCSI SAN," said Gibson, CTO and co-founder of high-performance storage firm Panasas.
The declustering technique varies the RAID equations so that only a handful of the disks are involved in each parity calculation and the set of disks in each handful changes across the bits of each disk. The result is that every pair of disks in declustered RAID level 5 is involved in the same number of RAID equations.
"When a disk fails, every disk does a little recovery work, spread over all the RAID controllers, rather than one RAID controller and a handful of disks doing all the recovery work while the other disks and RAID controllers do no recovery work," says Gibson.
Because declustering also spreads the space of online spares over all the disks, instead of having a few online spares empty and idle, even the writing in the recovery of a failed disk is spread over the entire array, he said.
"The net result is that RAID changes from a serial application on one little computer to a parallel and distributed computation of all disks and all controllers," said Gibson. "And the computation can finish much, much faster. Or it can degrade user performance during recovery much, much less. Or both."
Solving the RAID Rebuild Problem
The problem with drive rebuild time is that the time it takes to rebuild a 1TB or larger disk drive is measured in hours if not days, depending on how busy the storage system and RAID group are, said Ray Lucchesi, president and founder of Silverton Consulting, a storage, strategy and systems consulting services company. And as 1.5 and 2TB drives come online, the time just keeps getting longer.
However, the rebuild time can be sped up by having larger single-parity RAID groups (with more disk spindles in the RAID stripe), cross-coupling RAID groups, or by using RAID 6, which often has more spindles in the RAID group.
But the problem with a large RAID group is that data overwrite could potentially cause a performance bottleneck on the parity disks, so the problem facing RAID users is how to combine the faster drive rebuild times of a large RAID group with the smaller write penalty of a small RAID group.
Lucchesi thinks parity declustering could be the solution, as it distributes the parity across a number of disk drives as well as the data, so no one disk holds the parity for all drives.
Doing this would eliminate the hot drive phenomenon, normally dealt with by using smaller RAID group sizes, Lucchesi said.
"The core of the problem is that the time it takes to read a whole disk is getting longer by about 20 percent each year," said Gibson. "Disk data rates are growing much slower than disk capacities, so it just takes longer to read each bigger disk than the last."
Due to bigger disks, RAID systems take longer to recover from a failed disk. Traditional RAID systems recover a disk by reading all the remaining disks beginning to end and by writing the missing data beginning to end to an online spare disk.
Taking longer to recover a failed and replaced disk is a bad thing in two ways, said Gibson.
First, it means that the period of vulnerability — the time it takes to replace a failed disk and recover its contents — is getting longer. During this time, more failures can lead to data loss. A longer period of vulnerability means lower data reliability.
Second, recovery is a lot of work. It reduces valuable disk accesses for users, lowering the disk performance for user workloads. The longer the recovery, the longer the periods in which users get less done.
The results are that the RAID system takes longer to get back to full protection; the chances of incurring other failures increases; and the probability of data loss goes up.
Gibson says Panasas' parity declustering turns RAID from a local operation of one controller and a few disks into a parallel algorithm using all the controllers and disks in the storage pool.
With pools of tens to hundreds of individual disk arrays, parity declustering enables recovery to be tens to hundreds of times faster. And, it spreads the work so thinly per disk that concurrent user work sees far less interference from recovery, said Gibson.
Spreading all the data across many drives is not unique to Panasas. Other companies that do this include EMC, Google, Hitachi, HP, IBM and Isilon.
IBM XIV and RAID-X
IBM XIV is a grid-based storage system that does not use traditional RAID.
"Instead, data is distributed across loosely connected data modules, which act as independent building blocks," said Tony Pearson, senior storage consultant for IBM System Storage.
"XIV divides each LUNinto 1MB 'chunks,' and stores two copies of each chunk on separate drives in separate modules," said Pearson. "We call this RAID-X."
Interestingly, the terms used to describe such declustering range from RAID-X to wide-striping, metavolumes, extent pools, stripes across stripes, plaid stripes and RAID 500, each proving in its own unique way that RAID innovation continues, even if it's not always called RAID.
Follow Enterprise Storage Forum on Twitter