Welcome to our howto on implementing Linux software RAID with no expense other than however many hard disks you wish to use, whether they be inexpensive ordinary PATA (IDE) drives, expensive SCSI drives, or newfangled serial ATA (SATA) drives.
RAID (define) is no longer the exclusive province of expensive systems with SCSI drives and controllers. In fact it hasn't been since the 2.0 Linux kernel, released in 1996, which was the first kernel release to support software RAID.
What RAID Is For
A RAID array provides various functions, depending on how it is configured: high speed, high reliability, or both. RAID 0, 1, and 5 are probably the most commonly used.
RAID 1, or "mirroring," clones two disks. Your storage space is limited to the size of the smaller drive, if your two drives are not the same size. If one drive fails, the other carries on, allowing you to continue working until it is convenient to replace the disk. RAID 1 is slower than striping, because all writes are done twice.
RAID 5 combines striping with parity checks, so you get speed and data redundancy. You need a minimum of three disks. If a single disk is lost your data are still intact. Losing two disks means losing everything. Reads are very fast, while writes are a bit slower because the parity checks must be calculated.
You may use disks of different sizes in all of these, though you'll get better performance with disks of the same capacity and geometry. Some admins like to use different brands of hard disks on the theory that different brands will have different flaws.
What RAID Is Not
It is not a substitute for a good backup regimen, backup power supplies, surge protectors, and other sensible protections. Linux software RAID is not a substitute for true hardware SCSI RAID in high-demand mission-critical systems. But it is a dandy tool for workstations and low- to medium-duty servers. PATA (or IDE) drives (define) are not hot-swappable, but you can set up an array with standby drives that automatically take over in the event of a disk failure. If you don't want to use standby drives your downtime is limited only to the time it takes to replace the drive, because the system is usable even while the array is rebuilding itself.
Hardware RAID controllers come in a rather bewildering variety. Mainboards come with built-in IDE RAID controllers, and PCI IDE RAID controller cards can be had for as little as $25. Most of these are like horrid Winmodems, in that they require Windows drivers to work and have Windows-only management tools. I wouldn't bother with IDE RAID controllers -- Linux software RAID outperforms them in every way, and costs nothing.
A true hardware RAID controller operates independently of the host operating system. You'll find a lot of choices for SATA (define) and SCSI drives. SATA controllers cost from $150 to the sky's the limit, depending on how many drives they support, how much onboard memory they have, and other refinements that take the processing load away from the system CPU.
Good SCSI controllers start around $400 and have an even higher sky. Both SATA and SCSI controllers should support hot-swapping, error handling, caching, and fast data-transfer speeds. A good-quality hardware controller is fast and reliable; but finding such a one is not so easy. Many an experienced admin has lost sleep and hair over flaky RAID hardware.
Something to keep in mind for the future - as SATA support in Linux matures, and the technology itself improves, it should be a capable SCSI replacement for all but the most demanding uses. (For more information see the excellent pages posted by the maintainer of the kernel SATA drivers, Jeff Garzik.)
Software RAID Advantages
Linux software RAID is more versatile than most hardware RAID controllers. Hardware controllers see each drive as a single member of the RAID array, and handle only one type of hard disk. Most hardware controllers are picky about the brand and size of hard disk -- you can't just slap in any old disks you want, but must carefully choose compatible disks. And it's not always documented what these are.
Linux RAID is a separate layer from Linux block devices, so any block device can be a member of the array -- a particular partition, any type of hard drive, and you can even mix and match. Endless debates rage over which offers superior performance, hardware or software RAID. The answer is "it depends." An old slow RAID controller won't match the performance of a modern system with a fast CPU and fast buses. The number of drives on a cable, the types of drives and cabling, the speed of the data bus- all of these affect performance in addition to the speed of the CPU and the demands placed on it.
One disadvantage is hot-swap ability is limited and not entirely reliable.
Converting An Existing System To RAID
First of all, your power supply must be capable of powering all the drives you want to run on the system. Adding as many drives as you want is easy and inexpensive. If you're going to purchase new hard disks, you might as well get SATA, because the cost is about the same as PATA. SATA drives are faster and use less cabling, and will soon supplant PATA drives. PCI controller cards for additional PATA and SATA disks cost around $40, and will run two disks each. The built-in IDE channels on mainboards can handle two disks each, but you should run only one disk per channel. You'll get better performance and minimize the risk of a fault taking out both hard disks.
Next, install the raidtools2 and mdadm packages. If you want your RAID array to be bootable, you'll need RAID support built into the kernel. Or use a loadable module and use an initrd file, which to me is more trouble than rebuilding a kernel. Tomorrow in Part 2 we'll cover how to do all of this. You may get a head start by consulting the links in Resources.
- Linux-raid mailing list
- The Software-RAID HOWTO
- Chapter 10 of the Linux Cookbook, "Patching, Customizing, and Upgrading Kernels"
Article courtesy of EnterpriseNetworkingPlanet