A 5-Minute Crash Course on RAID

What Does RAID Stand For and How Did It Come About?

In 1987, three researchers at the University of California Berkeley,
published a paper called “A Case for Redundant Arrays of Inexpensive
(Independent has replaced Inexpensive) Disks (RAID).” It described various
types of disk arrays, referred to by the acronym RAID. The basic idea of RAID
combined multiple small, inexpensive disk drives into an array of disk
drives, which yields performance exceeding that of a single large expensive
drive. Additionally, this array of drives appears to the computer as a single
logical storage unit or drive.

All RAID implementations have two concepts:

— disk striping enhances I/O performance by balancing I/O load across the
disks comprising an array.

— disk fault tolerance provides a form of check data, which enhances user
data availability by enabling the recovery of user data if the disk
containing its should fail.

What is the concept of data striping?

A RAID system appears to the operating system to be a single logical hard
disk. RAID employs the technique of striping, which involves partitioning
each drive’s storage space into units ranging from a sector (512 bytes) up to
several megabytes. The stripes of all the disks are interleaved and addressed
in order.

In a single-user system where large records, such as medical or other
scientific images, are stored, the stripes are typically set up to be small
(perhaps 512 bytes) so that a single record spans all disks and can be
accessed quickly by reading all disks at the same time.

In a multi-user system, better performance requires establishing a stripe
wide enough to hold the typical or maximum size record. This allows
overlapped disk I/O across drives.

What Are RAID Levels for Disk Fault Tolerance?
The University of California Berkeley paper defined levels of array
architectures, RAID-1 through RAID-5. Each level provides disk
fault-tolerance with different trade-offs in features and performance. In
addition to these five redundant array architectures, a non-redundant array
of disk drives is called a RAID-0 array.

RAID Level 0

RAID Level 0 is not redundant, hence does not truly fit the RAID acronym. In
level 0, data is split across drives, resulting in higher data throughput.
Since no redundant information is stored, performance is very good, but the
failure of any disk in the array results in data loss. This level is commonly
referred to as striping.

Uses: Applications which require very high-speed storage, but does not need
redundancy. Photoshop temporary files are a good example.

RAID Level 1

RAID Level 1 provides redundancy by duplicating all data from one drive on
another drive. The performance of a level 1 array is only slightly better
than a single drive, but if either drive fails, no data is lost. This is a
good entry-level redundant system, since only two drives are required;
however, since one drive is used to store a duplicate of the data, the cost
per megabyte is high. This level is commonly referred to as mirroring.

Uses: Applications which require redundancy with fast random writes;
entry-level systems where only two drives are available. Small file servers
are an example.

RAID Level 2

RAID Level 2, which uses Hamming error correction codes, is intended for use
with drives which do not have built-in error detection. All SCSI drives
support built-in error detection, so this level is of little use when using
SCSI drives.

RAID Level 3

RAID Level 3 stripes data at a byte level across several drives, with parity
stored on one drive. It is otherwise similar to level 4. Byte-level striping
requires hardware support for efficient use.

RAID Level 4

RAID Level 4 stripes data at a block level across several drives, with parity
stored on one drive. The parity information allows recovery from the failure
of any single drive. The performance of a level 4 array is very good for
reads (the same as level 0). Writes, however, require that parity data be
updated each time. This slows small random writes, in particular, though
large writes or sequential writes are fairly fast. Because only one drive in
the array stores redundant data, the cost per megabyte of a level 4 array can
be fairly low.

Uses: Applications which require redundancy at low cost, or with high-speed
reads. This is good for archival storage. Larger file servers are an example.

RAID Level 5

RAID Level 5 is similar to level 4, but distributes parity among the drives.
This can speed small writes in multiprocessing systems, since the parity disk
does not become a bottleneck. Because parity data must be skipped on each
drive during reads, however, the performance for reads tends to be
considerably lower than a level 4 array. The cost per megabyte is the same as
for level 4.

Uses: Similar to level 4, but may provide higher performance if most I/O is
random and in small chunks. Database servers are an example.
Where on the Web Can I Find a Good Reference Guide to RAID?
One of the best Web resources about RAID is Veritas’s 45-page technology
white paper called RAID for Enterprise Computing.

http://eval.veritas.com/webfiles/docs/RAIDirectorWP.pdf

Elizabeth M. Ferrarini – She is a free-lance writer from Boston,
Massachusetts.

Latest Articles

Top Managed Service Providers (MSPs) for 2021

Managed service providers perform an arsenal of outsourced tasks, including cloud deployment and migration, advanced data analytics, and IoT and network installation. Managed services...

7 Essential Compliance Regulations for Data Storage Systems

Many of the compliance systems that companies have to deal with require the same kind of actions, processes, and plans. Here are key elements.

DNA Data Storage: Could Data Files Be Stored as DNA?

Using DNA storage for digital data is a well established idea. Here’s how it could come to full fruition as acceptance widens.