Fully Automated RAID Level Selection Techniques for Disk Arrays Page 2
Disk arrays are an integral part of high-performance SANs, and their importance and scale are growing as continuous access to information becomes critical to the day-to-day operation of modern business. Before a disk array can be used to store data, values for many configuration parameters must be specified. Achieving the right balance between cost, availability, and application performance needs depends on many correct decisions. Unfortunately, the tradeoffs between the choices are surprisingly complicated. The focus here, therefore, is on just one of these choices -- which RAID level, or data redundancy scheme, to use.
The two most common redundancy schemes are RAID 1/0 (striped mirroring), where every byte of data is kept on two separate disk drives striped for greater I/O parallelism, and RAID 5, where a single parity block protects the data in a stripe from disk drive failures. RAID 1/0 provides greater read performance and failure tolerance but requires almost twice as many disk drives to do so.
Disk arrays, therefore, organize their data storage into Logical Units (LUs), which appear as linear block spaces to their clients. A small disk array with a few disks might support up to 8 LUs, whereas a large one with hundreds of disk drives can support thousands. Each LU typically has a given RAID level -- a redundancy mapping onto one or more underlying physical disk drives. This decision is made at LU-creation time and is typically irrevocable; once the LU has been formatted, changing its RAID level requires copying all the data onto a new LU.
Furthermore, the workloads should be run on a SAN as sets of stores and streams. A store is a logically contiguous array of bytes, such as a file system or a database table, with a size typically measured in gigabytes. A stream is a set of access patterns on a store described by attributes, such as request rate, request size, inter-stream phasing information, and sequentiality. A RAID level must be decided for each store in the workload.
Nevertheless, host-based logical volume managers (LVMs) complicate matters by allowing multiple stores to be mapped onto a single LU, effectively blending multiple workloads together. In other words, a host-based LVM manages disk space at a logical level. It controls fixed-disk resources by mapping data between logical and physical storage and by allowing data to span multiple disks, which in turn allows it to be discontiguous, replicated, and dynamically expanded.