Configuring storage area networks (SANs), even at the enterprise scale, is traditionally undertaken by human experts using a time-consuming process of trial and error, guided by simple rules of thumb. Due to the complexity of the design process and lack of workload information, the resulting systems often cost significantly more than necessary or fail to perform adequately.
This article presents a solution to this problem. It shows you how you can automate the design and configuration process through the use of various tools to explore the design space more thoroughly and automatically implement the design, thereby eliminating many tedious, error-prone operations.
For example, one of these tools, from Hewlett-Packard, is structured as an iterative loop design tool (ILDT) for storage area networks (SANs): it analyzes a workload to determine its requirements, creates a new SAN design to better meet these requirements, and migrates the existing system to the new design. The tool also repeats the loop until it finds a SAN design that satisfies the workload's I/O requirements. This article describes the iterative loop and demonstrates (via a discussion) a prototype implementation that converges rapidly to appropriate SAN designs.
Running Circles Around SAN Administration
Enterprise-scale SANs are extremely difficult to manage. The size of these systems, the thousands of configuration choices, and the lack of information about workload behaviors raise numerous management challenges. Users' demand for larger data capacities, more predictable performance, and faster deployment of new applications and services exacerbate the management problems. Worse, administrators skilled in designing, implementing and managing SANs are expensive and in short supply. It is estimated that the cost of managing a SAN is several times the purchase price of the storage hardware. These difficulties are beginning to cause enterprise customers to out-source their storage needs to Internet data centers and storage service providers (such as Exodus), who will lease a SAN. The growing importance of this storage model implies that the ability to accurately provision SANs to meet workload needs, will become even more critical in the future.
Storage management challenges include designing and implementing the storage system, adapting to changes in workloads and device status, designing the SAN, and backing up the data. Thus, with the preceding in mind, you need to concentrate on the important problem of SAN configuration: designing and implementing the SAN needed to support a particular workload, before the SAN is put into production use.
Given a pool of storage resources and a workload, you will want to determine how to automatically choose storage devices, determine the appropriate device configurations, and assign the workload to the configured storage. These tasks are challenging, because the large number of design choices may interact with each other in poorly understood ways. To make reasonable design choices, administrators need detailed knowledge of applications' storage behavior, which is difficult to obtain. Once a design has been determined, implementing the chosen design is time-consuming, tedious and error-prone. A mistake in any of the implementation operations is difficult to identify, and can result in a failure to meet the performance requirements of the workload. SAN configuration is naturally an iterative process, traditionally undertaken by human experts using rules of thumb gained through years of experience. They start with a first design based on an initial understanding of the workload, and then successively refine the design based on the observed behavior of the system.
Unfortunately, the complexities of the systems being designed, coupled with inadequate information about the true workload requirements, mean that the resulting systems are often over-provisioned, so that they are too expensive, or under-provisioned, so that they perform poorly.
For example, the iterative loop design tool (ILDT) automates the iterative approach to SAN configuration. ILDT analyzes a running workload to determine its requirements, calculates a new SAN design, and migrates the existing system to the new design. The ILDT makes better design decisions by systematically exploring the large space of possible designs. ILDT also decreases the chance of human error by automating the configuration tasks. As a result, ILDT frees administrators to focus on the applications that use the SAN. In other words, ILDT generates SAN configurations that employ near minimal resources to satisfy workload requirements, and that it converges to the final system design in a small number of iterations.
The iterative approach to system management is applicable to many levels of the system, including the block-level array subsystem, the file system and the application itself. Therefore, you need to focus on the block-level storage, as it provides a potential benefit to all applications that store data, including those that use the file system and those that use the raw block interface directly. The three stages of the iterative storage management loop as follows:
Design New System
You need to design a system to match the current workload requirements. This stage includes choosing which storage devices to use, selecting their configurations, and determining how to map the workload's data onto the configured devices. The requirements may come from observations of the workload behavior in previous iterations. In other words, a workload is the set of requests observed by the SAN. A particular workload may be generated by one or more applications using the SAN. A workload can be described in terms of stores and streams. A store is a logically contiguous chunk of storage. A stream captures information about the I/O accesses to a single associated store, such as average request rate and average request size. Expressing a workload in terms of stores and streams, decouples the specification of the workload from the application(s) that generate that workload. As a result, the workload specification and assignment techniques are applicable to a broad range of applications.
You also need to configure the disk arrays and other SAN components. In addition, you need to enable access to the storage resources from the hosts, and migrate the existing application data (if any) to the new design.
Finally, you need to analyze the running system to learn the workload's behavior. This information can then be used as input to the design stage in the next iteration.
Removing the Human Administrator Factor
You will want to remove the human administrators from the loop as much as possible, by automating the iterative loop, to the point where all that is required at the beginning is workload capacity information. The loop will then learn the performance requirements across multiple iterations of the loop.
In order to be considered successful, the automated loop must meet two goals. First, it must converge on a viable design that meets the workload's requirements without over- or under-provisioning. Second, it must converge to a stable final system as quickly as possible, with as little input as possible required from its users.Now, let's look at the components of the ILDT loop in more detail. Also, an explanation will be given on how the components interact to design a SAN iteratively.
- Performance model.
The analysis workload stage summarizes a workload's behavior. This summary is used to predict the workload's requirements in the next iteration. Two components cooperate to implement the design new system stage: performance models for the storage devices, and a design engine, or solver. The performance models predict the utilization of storage devices under a candidate workload. The solver designs a new SAN using the performance models to guarantee that no device in the design is overloaded. The implement design stage migrates any existing system to the new design.
Accurate resource estimation and minimal designs result in correctly provisioned systems. Balanced designs and short migration time enable the loop to configure SANs quickly. With the preceding in mind, let's briefly look at these components and their inputs/outputs; and, focusing on how each component contributes to the operation of the ILDT loop.
The analysis component takes as input a detailed blocklevel trace of the workload's I/O references and a description of the SAN (logical unit (LU) and logical volume layouts). It outputs a summary of the trace in terms of stores and streams.
In other words, disk array storage is divided into logical units (LUs), which are logically contiguous arrays of blocks exported by a disk array. LUs are usually constructed by binding a subset of the array's disks together using RAID techniques. LU sizes are typically fixed by the array configuration, and so are unlikely to correspond to application requirements.
Furthermore, logical volumes add flexibility by providing a level of virtualization that enables the server to split the (large) LUs into multiple pieces or to stripe data across multiple LUs. A logical volume provides the abstraction of a virtual disk for use by a file system or database table.
Therefore, the analysis component captures enough properties of the I/O trace in the streams to enable the models to make accurate performance predictions. The analysis component models an I/O stream as a series of alternating ON/OFF periods, where I/O requests are only generated during ON periods. More specifically, the minimum duration of an ON period and minOnTime is defined as 0.5 seconds, and the minimum duration of an OFF period, minOffTime, as at least two seconds of inactivity.
During an ON period, six parameters are measured for each stream: the mean read and write request rates; the mean read and write request sizes; the run count, which is the mean number of sequential requests; and, the queue length, which is the mean number of outstanding I/O requests. Because streams can be ON or OFF at different times, the inter-stream phasing and correlations are captured using the overlap fraction, which is approximately the fraction of time that two streams' ON periods overlap.
The I/O activity is traced and analyzed later (or on another machine) to minimize the interference with the workload. Capturing I/O trace data results in a CPU overhead of 1-2% and an increase in I/O load of about 0.5%. Even day-long traces are typically only a few gigabytes long, which is a negligible storage overhead as the trace only has to be kept until the analysis is run. The duration of tracing activity is workload dependent, as it has to cover the full range of workload behavior. For simple workloads, a few minutes may be sufficient. For complex workloads, it may take a few hours.
The performance model takes as input a workload summary from the analysis component, and a candidate SAN design from the solver. The candidate design specifies both the parameters for the SAN and the layout of stores onto the SAN. It outputs the utilization of each component in the SAN.
The model component needs to predict SAN performance quickly and accurately. This component is implemented by using table-based models. The models use the stream information collected during the analysis stage to differentiate between sequential and random behavior, read and write behavior and ON-OFF phasing of disk I/Os. Models are used because simulating an I/O trace would be too slow for the solver to be able to examine a sufficient number of candidate configurations.
The solver reads as input the workload description generated by the analysis component, and outputs the design of a system that meets the workload's performance requirements. The output specifies a number of disk arrays, the configuration of those arrays (e.g., number of disks, LU configurations, controller and cache settings) and a mapping of the stores in the workload onto the disk arrays' LUs.
The solver efficiently searches the exponentially large space of SAN designs to find a balanced, valid, minimal design. The problem of efficiently packing a number of stores, with both capacity and performance requirements onto disk arrays, is similar to the problem of multi-dimensional bin packing.
Finally, the migration component takes as input the new design of the SAN, and changes the existing configuration to the new design. It configures storage devices, copies the data between old and new locations, and changes the parameters of the SAN to match the parameters in the new design.
Migration operates in two phases. First a plan is generated for the migration and then the plan is executed. The planning phase tries to minimize the amount of scratch space used and the amount of data that needs to be moved.
If the underlying logical volume manager (LVM) allows individual logical blocks to be moved, as opposed to an entire volume (store), then more advanced algorithms that generate efficient parallel plans can be used.
Second, in the execution phase, the migration component copies the stores to their destinations as specified by the plan. The migration can be executed with the workloads either online or offline. Offline migration creates a new logical volume, copies the data there, and deletes the original volume. Online migration allows the workloads to continue executing. It uses the LVM to mirror the volume to its new location, and then splits the mirror, removing the old half.
Finally, an alternative method that works during initial system configuration involves configuring the devices and then copying the data from a master copy of the stores to their final destinations. This approach works well if the design is changing substantially between iterations, but requires double the storage capacity to hold the master copy.
Summary and Conclusions
This article has introduced an iterative loop design tool (ILDT) for storage area networks (SANs), which can be used to automate SAN configuration. ILDT uses an iterative loop consisting of three stages: analyze workload, design system, and implement design. The components that implement these stages handle the problem of summarizing a workload; choosing which devices to use and how their parameters should be set; assigning the workload to the devices; implementing the design by setting the device parameters; and, migrating the existing system to the new design. Finally, the article showed that for the problem of SAN configuration, the ILDT loop satisfies two important properties: Rapid convergence, where the loop converges in a small number of iterations to the final system design; and correct resource allocation, where the loop allocates close to the minimal amount of resources necessary to support the workload.
[The preceding article is based on material provided by ongoing research at the Storage and Content Distribution Department, Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA 94304 on the "Hippodrome" iterative loop tool. Additional information for this article was also based on material contained in the following white paper:
"Hippodrome: running circles around storage administration" by Eric Anderson, Michael Hobbs, Kimberly Keeton, Susan Spence, Mustafa Uysal and Alistair Veitch available at: http://www.hpl.hp.com/SSP/papers/#FAST2002-hippodrome]
John Vacca is an information technology consultant and internationally known author based in Pomeroy, Ohio. Since 1982, Vacca has authored 39 books and more than 475 articles in the areas of advanced storage, computer security and aerospace technology. Vacca was also a configuration management specialist, computer specialist, and the computer security official for NASA's space station program (Freedom) and the International Space Station Program, from 1988 until his early retirement from NASA in 1995.