Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
- Performance model.
The analysis workload stage summarizes a workload's behavior. This summary is used to predict the workload's requirements in the next iteration. Two components cooperate to implement the design new system stage: performance models for the storage devices, and a design engine, or solver. The performance models predict the utilization of storage devices under a candidate workload. The solver designs a new SAN using the performance models to guarantee that no device in the design is overloaded. The implement design stage migrates any existing system to the new design.
Accurate resource estimation and minimal designs result in correctly provisioned systems. Balanced designs and short migration time enable the loop to configure SANs quickly. With the preceding in mind, let's briefly look at these components and their inputs/outputs; and, focusing on how each component contributes to the operation of the ILDT loop.
The analysis component takes as input a detailed blocklevel trace of the workload's I/O references and a description of the SAN (logical unit (LU) and logical volume layouts). It outputs a summary of the trace in terms of stores and streams.
In other words, disk array storage is divided into logical units (LUs), which are logically contiguous arrays of blocks exported by a disk array. LUs are usually constructed by binding a subset of the array's disks together using RAID techniques. LU sizes are typically fixed by the array configuration, and so are unlikely to correspond to application requirements.
Furthermore, logical volumes add flexibility by providing a level of virtualization that enables the server to split the (large) LUs into multiple pieces or to stripe data across multiple LUs. A logical volume provides the abstraction of a virtual disk for use by a file system or database table.
Therefore, the analysis component captures enough properties of the I/O trace in the streams to enable the models to make accurate performance predictions. The analysis component models an I/O stream as a series of alternating ON/OFF periods, where I/O requests are only generated during ON periods. More specifically, the minimum duration of an ON period and minOnTime is defined as 0.5 seconds, and the minimum duration of an OFF period, minOffTime, as at least two seconds of inactivity.
During an ON period, six parameters are measured for each stream: the mean read and write request rates; the mean read and write request sizes; the run count, which is the mean number of sequential requests; and, the queue length, which is the mean number of outstanding I/O requests. Because streams can be ON or OFF at different times, the inter-stream phasing and correlations are captured using the overlap fraction, which is approximately the fraction of time that two streams' ON periods overlap.
The I/O activity is traced and analyzed later (or on another machine) to minimize the interference with the workload. Capturing I/O trace data results in a CPU overhead of 1-2% and an increase in I/O load of about 0.5%. Even day-long traces are typically only a few gigabytes long, which is a negligible storage overhead as the trace only has to be kept until the analysis is run. The duration of tracing activity is workload dependent, as it has to cover the full range of workload behavior. For simple workloads, a few minutes may be sufficient. For complex workloads, it may take a few hours.
The performance model takes as input a workload summary from the analysis component, and a candidate SAN design from the solver. The candidate design specifies both the parameters for the SAN and the layout of stores onto the SAN. It outputs the utilization of each component in the SAN.
The model component needs to predict SAN performance quickly and accurately. This component is implemented by using table-based models. The models use the stream information collected during the analysis stage to differentiate between sequential and random behavior, read and write behavior and ON-OFF phasing of disk I/Os. Models are used because simulating an I/O trace would be too slow for the solver to be able to examine a sufficient number of candidate configurations.
The solver reads as input the workload description generated by the analysis component, and outputs the design of a system that meets the workload's performance requirements. The output specifies a number of disk arrays, the configuration of those arrays (e.g., number of disks, LU configurations, controller and cache settings) and a mapping of the stores in the workload onto the disk arrays' LUs.
The solver efficiently searches the exponentially large space of SAN designs to find a balanced, valid, minimal design. The problem of efficiently packing a number of stores, with both capacity and performance requirements onto disk arrays, is similar to the problem of multi-dimensional bin packing.
Finally, the migration component takes as input the new design of the SAN, and changes the existing configuration to the new design. It configures storage devices, copies the data between old and new locations, and changes the parameters of the SAN to match the parameters in the new design.
Migration operates in two phases. First a plan is generated for the migration and then the plan is executed. The planning phase tries to minimize the amount of scratch space used and the amount of data that needs to be moved.
If the underlying logical volume manager (LVM) allows individual logical blocks to be moved, as opposed to an entire volume (store), then more advanced algorithms that generate efficient parallel plans can be used.
Second, in the execution phase, the migration component copies the stores to their destinations as specified by the plan. The migration can be executed with the workloads either online or offline. Offline migration creates a new logical volume, copies the data there, and deletes the original volume. Online migration allows the workloads to continue executing. It uses the LVM to mirror the volume to its new location, and then splits the mirror, removing the old half.
Finally, an alternative method that works during initial system configuration involves configuring the devices and then copying the data from a master copy of the stores to their final destinations. This approach works well if the design is changing substantially between iterations, but requires double the storage capacity to hold the master copy.
Summary and Conclusions
This article has introduced an iterative loop design tool (ILDT) for storage area networks (SANs), which can be used to automate SAN configuration. ILDT uses an iterative loop consisting of three stages: analyze workload, design system, and implement design. The components that implement these stages handle the problem of summarizing a workload; choosing which devices to use and how their parameters should be set; assigning the workload to the devices; implementing the design by setting the device parameters; and, migrating the existing system to the new design. Finally, the article showed that for the problem of SAN configuration, the ILDT loop satisfies two important properties: Rapid convergence, where the loop converges in a small number of iterations to the final system design; and correct resource allocation, where the loop allocates close to the minimal amount of resources necessary to support the workload.
[The preceding article is based on material provided by ongoing research at the Storage and Content Distribution Department, Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA 94304 on the "Hippodrome" iterative loop tool. Additional information for this article was also based on material contained in the following white paper:
"Hippodrome: running circles around storage administration" by Eric Anderson, Michael Hobbs, Kimberly Keeton, Susan Spence, Mustafa Uysal and Alistair Veitch available at: http://www.hpl.hp.com/SSP/papers/#FAST2002-hippodrome]
John Vacca is an information technology consultant and internationally known author based in Pomeroy, Ohio. Since 1982, Vacca has authored 39 books and more than 475 articles in the areas of advanced storage, computer security and aerospace technology. Vacca was also a configuration management specialist, computer specialist, and the computer security official for NASA's space station program (Freedom) and the International Space Station Program, from 1988 until his early retirement from NASA in 1995.