RAID: Faster and Cheaper with Linux

Enterprise Storage Forum content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Welcome to our howto on implementing Linux software RAID with no expense other than however many hard disks you wish to use, whether they be inexpensive ordinary PATA (IDE) drives, expensive SCSI drives, or newfangled serial ATA (SATA) drives.

RAID (define) is no longer the exclusive province of expensive systems with SCSI drives and controllers. In fact it hasn’t been since the 2.0 Linux kernel, released in 1996, which was the first kernel release to support software RAID.

What RAID Is For

A RAID array provides various functions, depending on how it is configured: high speed, high reliability, or both. RAID 0, 1, and 5 are probably the most commonly used.



Endless debates rage over which offers superior performance, hardware or software RAID. The answer is “it depends.”

RAID 0, or “striping,” writes data across two or more drives. RAID 0 is very fast; data are split up in blocks and written across all the drives in the array. It will noticeably speed up everyday work, and is great for applications that generate large files, like image editing. It is not fault-tolerant — a failure on one disk means all data in the array are lost. That is no different than when a single drive fails, so if it’s speed and more capacity you want, go for it.

RAID 1, or “mirroring,” clones two disks. Your storage space is limited to the size of the smaller drive, if your two drives are not the same size. If one drive fails, the other carries on, allowing you to continue working until it is convenient to replace the disk. RAID 1 is slower than striping, because all writes are done twice.

RAID 5 combines striping with parity checks, so you get speed and data redundancy. You need a minimum of three disks. If a single disk is lost your data are still intact. Losing two disks means losing everything. Reads are very fast, while writes are a bit slower because the parity checks must be calculated.

You may use disks of different sizes in all of these, though you’ll get better performance with disks of the same capacity and geometry. Some admins like to use different brands of hard disks on the theory that different brands will have different flaws.

What RAID Is Not

It is not a substitute for a good backup regimen, backup power supplies, surge protectors, and other sensible protections. Linux software RAID is not a substitute for true hardware SCSI RAID in high-demand mission-critical systems. But it is a dandy tool for workstations and low- to medium-duty servers. PATA (or IDE) drives (define) are not hot-swappable, but you can set up an array with standby drives that automatically take over in the event of a disk failure. If you don’t want to use standby drives your downtime is limited only to the time it takes to replace the drive, because the system is usable even while the array is rebuilding itself.

Hardware RAID

Hardware RAID controllers come in a rather bewildering variety. Mainboards come with built-in IDE RAID controllers, and PCI IDE RAID controller cards can be had for as little as $25. Most of these are like horrid Winmodems, in that they require Windows drivers to work and have Windows-only management tools. I wouldn’t bother with IDE RAID controllers — Linux software RAID outperforms them in every way, and costs nothing.

A true hardware RAID controller operates independently of the host operating system. You’ll find a lot of choices for SATA (define) and SCSI drives. SATA controllers cost from $150 to the sky’s the limit, depending on how many drives they support, how much onboard memory they have, and other refinements that take the processing load away from the system CPU.

Good SCSI controllers start around $400 and have an even higher sky. Both SATA and SCSI controllers should support hot-swapping, error handling, caching, and fast data-transfer speeds. A good-quality hardware controller is fast and reliable; but finding such a one is not so easy. Many an experienced admin has lost sleep and hair over flaky RAID hardware.

Something to keep in mind for the future – as SATA support in Linux matures, and the technology itself improves, it should be a capable SCSI replacement for all but the most demanding uses. (For more information see the excellent pages posted by the maintainer of the kernel SATA drivers, Jeff Garzik.)

Continued on page 2: Software RAID Advantages

Continued From Page 1

Software RAID Advantages
Linux software RAID is more versatile than most hardware RAID
controllers. Hardware controllers see each drive as a single member of
the RAID array, and handle only one type of hard disk. Most hardware
controllers are picky about the brand and size of hard disk — you can’t
just slap in any old disks you want, but must carefully choose
compatible disks. And it’s not always documented what these are.

Linux RAID is a separate layer from Linux block devices, so any block
device can be a member of the array — a particular partition, any type
of hard drive, and you can even mix and match. Endless debates rage
over which offers superior performance, hardware or software RAID. The
answer is “it depends.” An old slow RAID controller won’t match the
performance of a modern system with a fast CPU and fast buses. The
number of drives on a cable, the types of drives and cabling, the speed
of the data bus- all of these affect performance in addition to the
speed of the CPU and the demands placed on it.

One disadvantage is hot-swap ability is limited and not entirely
reliable.

Converting An Existing System To RAID
First of all, your power supply must be capable of powering all the
drives you want to run on the system. Adding as many drives as you want
is easy and inexpensive. If you’re going to purchase new hard disks,
you might as well get SATA, because the cost is about the same as PATA.
SATA drives are faster and use less cabling, and will soon supplant
PATA drives. PCI controller cards for additional PATA and SATA disks
cost around $40, and will run two disks each. The built-in IDE channels
on mainboards can handle two disks each, but you should run only one
disk per channel. You’ll get better performance and minimize the risk
of a fault taking out both hard disks.

Next, install the raidtools2 and mdadm packages. If you
want your RAID array to be bootable, you’ll need RAID support built
into the kernel. Or use a loadable module and use an initrd
file, which to me is more trouble than rebuilding a kernel. Tomorrow
in Part 2 we’ll cover how to do all of this. You may get a head start
by consulting the links in Resources.

Resources

Article courtesy of EnterpriseNetworkingPlanet

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends, and analysis.

Latest Articles

15 Software Defined Storage Best Practices

Software Defined Storage (SDS) enables the use of commodity storage hardware. Learn 15 best practices for SDS implementation.

What is Fibre Channel over Ethernet (FCoE)?

Fibre Channel Over Ethernet (FCoE) is the encapsulation and transmission of Fibre Channel (FC) frames over enhanced Ethernet networks, combining the advantages of Ethernet...

9 Types of Computer Memory Defined (With Use Cases)

Computer memory is a term for all of the types of data storage technology that a computer may use. Learn more about the X types of computer memory.