A 5-Minute Crash Course on RAID

Share it on Twitter  
Share it on Facebook  
Share it on Linked in  

What Does RAID Stand For and How Did It Come About?

In 1987, three researchers at the University of California Berkeley, published a paper called "A Case for Redundant Arrays of Inexpensive (Independent has replaced Inexpensive) Disks (RAID)." It described various types of disk arrays, referred to by the acronym RAID. The basic idea of RAID combined multiple small, inexpensive disk drives into an array of disk drives, which yields performance exceeding that of a single large expensive drive. Additionally, this array of drives appears to the computer as a single logical storage unit or drive.

All RAID implementations have two concepts:

-- disk striping enhances I/O performance by balancing I/O load across the disks comprising an array.

-- disk fault tolerance provides a form of check data, which enhances user data availability by enabling the recovery of user data if the disk containing its should fail.

What is the concept of data striping?

A RAID system appears to the operating system to be a single logical hard disk. RAID employs the technique of striping, which involves partitioning each drive's storage space into units ranging from a sector (512 bytes) up to several megabytes. The stripes of all the disks are interleaved and addressed in order.

In a single-user system where large records, such as medical or other scientific images, are stored, the stripes are typically set up to be small (perhaps 512 bytes) so that a single record spans all disks and can be accessed quickly by reading all disks at the same time.

In a multi-user system, better performance requires establishing a stripe wide enough to hold the typical or maximum size record. This allows overlapped disk I/O across drives.

What Are RAID Levels for Disk Fault Tolerance? The University of California Berkeley paper defined levels of array architectures, RAID-1 through RAID-5. Each level provides disk fault-tolerance with different trade-offs in features and performance. In addition to these five redundant array architectures, a non-redundant array of disk drives is called a RAID-0 array.

RAID Level 0

RAID Level 0 is not redundant, hence does not truly fit the RAID acronym. In level 0, data is split across drives, resulting in higher data throughput. Since no redundant information is stored, performance is very good, but the failure of any disk in the array results in data loss. This level is commonly referred to as striping.

Uses: Applications which require very high-speed storage, but does not need redundancy. Photoshop temporary files are a good example.

RAID Level 1

RAID Level 1 provides redundancy by duplicating all data from one drive on another drive. The performance of a level 1 array is only slightly better than a single drive, but if either drive fails, no data is lost. This is a good entry-level redundant system, since only two drives are required; however, since one drive is used to store a duplicate of the data, the cost per megabyte is high. This level is commonly referred to as mirroring.

Uses: Applications which require redundancy with fast random writes; entry-level systems where only two drives are available. Small file servers are an example.

RAID Level 2

RAID Level 2, which uses Hamming error correction codes, is intended for use with drives which do not have built-in error detection. All SCSI drives support built-in error detection, so this level is of little use when using SCSI drives.

RAID Level 3

RAID Level 3 stripes data at a byte level across several drives, with parity stored on one drive. It is otherwise similar to level 4. Byte-level striping requires hardware support for efficient use.

RAID Level 4

RAID Level 4 stripes data at a block level across several drives, with parity stored on one drive. The parity information allows recovery from the failure of any single drive. The performance of a level 4 array is very good for reads (the same as level 0). Writes, however, require that parity data be updated each time. This slows small random writes, in particular, though large writes or sequential writes are fairly fast. Because only one drive in the array stores redundant data, the cost per megabyte of a level 4 array can be fairly low.

Uses: Applications which require redundancy at low cost, or with high-speed reads. This is good for archival storage. Larger file servers are an example.

RAID Level 5

RAID Level 5 is similar to level 4, but distributes parity among the drives. This can speed small writes in multiprocessing systems, since the parity disk does not become a bottleneck. Because parity data must be skipped on each drive during reads, however, the performance for reads tends to be considerably lower than a level 4 array. The cost per megabyte is the same as for level 4.

Uses: Similar to level 4, but may provide higher performance if most I/O is random and in small chunks. Database servers are an example. Where on the Web Can I Find a Good Reference Guide to RAID? One of the best Web resources about RAID is Veritas's 45-page technology white paper called RAID for Enterprise Computing.


Elizabeth M. Ferrarini - She is a free-lance writer from Boston, Massachusetts.


Want the latest storage insights?