Modern computer systems are expected to be up continuously: even planned downtime to accomplish system reconfiguration is becoming unacceptable, so more and more changes are having to be made to live systems that are running production workloads.
One of those changes is data migration: moving data from one storage area network (SAN) device to another for load balancing, system expansion, failure recovery, or a myriad of other reasons. Traditional methods for achieving this either require application downtime or severely impact the performance of foreground applications -- neither a good outcome when performance predictability is almost as important as raw speed.
This article presents the solution to this problem, as well as others. It will show you how to use a control-theoretical approach to statistically guarantee a "bound" on the amount of impact on foreground work during a data migration, while still accomplishing the data migration in as short a time as possible. (Data-bound controls simply refer to the use of data-binding properties of the various controls, in which these controls are automatically linked to result set columns of a data control, allowing for automatic synchronization between the control data and the result set data.) The result is better quality of service (QoS) for the end users, less stress for the system administrators, and systems that can be adapted more readily to meet changing demands.
Online Data Migration Architecture
Current enterprise computing systems store tens of terabytes of active, online data in dozens to hundreds of disk arrays, interconnected by storage area networks (SANs) like Fibre Channel or Gigabit Ethernet. Keeping such systems operating in the face of changing access patterns (whether gradual, seasonal, or unforeseen), new applications, equipment failures, new resources, and the need to balance loads to achieve acceptable performance requires that data be moved, or migrated between SAN components -- sometimes on short notice. In other words, creating and restoring online backups can be viewed as a particular case of data migration in which the source copy is not erased.
Existing approaches to data migration either take the data offline while it is moved or allow the I/O resource consumption engendered by the migration process itself to interfere with foreground application accesses and slow them down - sometimes to unacceptable levels. The former is clearly undesirable in today's global, always-on Internet environment, where people from around the globe are accessing data day and night. The latter is almost as bad, given that the predictability of information-access applications is almost as much a prerequisite for the success of a modern enterprise as is their raw performance.
The Data Migration Problem
The data migration problem is formalized as follows: The data to be migrated is accessed by client applications that continue to execute in the foreground in parallel with the migration. The inputs to the migration engine are a migration plan -- a sequence of data moves to rearrange the data placement on the system from an initial state to the desired final state; and, client application quality-of-service (QoS) demands--I/O performance specifications that must be met while migration takes place.
Highly variable service times in SANs (e.g., due to unpredictable positioning delays, caching, and I/O request reordering) and workload fluctuations on arbitrary time scales, make it difficult to provide absolute guarantees, so statistical guarantees are preferable unless gross over-provisioning can be tolerated. In other words, the data-migration problem is to complete the data migration in the shortest possible time that is compatible with maintaining the QoS goals.
QoS Goals Formulation
One of the keys to the problem is a useful formalization of the QoS goals. In other words, a "store" is a logically contiguous array of bytes, such as a database table or a file system; its size is typically measured in gigabytes. Stores are accessed by streams, which represent I/O access patterns. Each store may have one or multiple streams. The granularity of a stream is somewhat at the whim of the definer, but usually corresponds to some recognizable entity such as an application.
On the other hand, Global QoS guarantees bound -- the aggregate performance of I/Os from all client applications in the system, but do not guarantee the performance of any individual store or application. They are seldom sufficient for realistic application mixes - especially since access demands on different stores may be significantly different during migration. Nevertheless, stream-level guarantees have the opposite difficulty: they can proliferate without bound, and so run the risk of scaling poorly due to management overhead.
At the intermediate level (the one adopted by an online data migration architecture), the goal is to provide store-level guarantees. In practice, this has similar effects to stream-level guarantees for real-life workloads, because the data-gathering system that is normally used to generate workload characterizations creates one stream for each store by default. Furthermore, such QoS specifications may be derived from application requirements (e.g., based on the timing constraints and buffer size of a media-streaming server), or specified by hand, or empirically derived from workload monitoring and measurements.
An old approach to performing backups and data relocations is to do them at night while the system is idle. As previously discussed, this does not help with many current applications, such as e-business that require continuous operation and adaptation to quickly changing system/workload conditions. The approach of bringing the whole (or parts of the) system offline is often impractical, due to the substantial enterprise costs it incurs. Perhaps surprisingly, true online migration and backup are still in their infancy. But, existing logical volume managers (such as the HP-UX logical volume manager (LVM) and the Veritas Volume Manager, VxVM) have long been able to provide continuing access to data while it is being migrated. This is achieved by creating a mirror of the data to be moved, with the new replica in the place where the data is to end up. The mirror is then silvered (the replicas made consistent by bringing the new copy up to date) after which the original copy can be disconnected and discarded. An online data migration architecture (ODMA) uses this trick, too. However, at present, there isn't any existing solution that bounds the impact of migration on client applications, while this is occurring in terms that relate to their performance goals. Although VxVM provides a parameter (vol_default_iodelay) that is used to throttle I/O operations for silvering, it is applied regardless of the state of the client application. (To establish a backup mirror is to tell the primary disk set to copy its data over to the backup mirror, thus referring to this as silvering the mirror; a reference to the silver that is put on the back of a "real" mirror.)
High-end disk arrays provide restricted support for online data migration: the source and destination devices must be identical Logical Units (LUs) within the same array, and only global, device-level QoS guarantees such as bounds on disk utilization are supported. Some commercial video servers can re-stripe data online when disks fail or are added, and provide guarantees for the specific case of highly-sequential, predictable multimedia workloads. An ODMA does not make any assumptions about the nature of the foreground workloads, nor about the devices that comprise the SAN subsystem. It provides device-independent, application-level QoS guarantees. Existing storage management products can detect the presence of performance hot spots in the SAN when things are going wrong, and notify system administrators about them but it is still up to humans to decide how to best solve the problem. In particular, there is no automatic throttling system that might address the root cause once it has been identified.
Although an ODMA eagerly uses excess system resources in order to minimize the length of the migration, it is in principle possible to achieve zero impact on the foreground load by applying idleness-detection techniques to migrate data only when the foreground load has temporarily stopped. An ODMA also provides performance guarantees to applications (i.e., the "important tasks") by directly monitoring and controlling their performance.
There has been substantial work on fair scheduling techniques since their inception. In principle, it would be possible to schedule migration and foreground I/Os at the volume-manager level without relying on an external feedback loop. However, real-world workloads are complicated and have multiple, nontrivial properties such as sequentiality, temporal locality, self-similarity, and burstiness. How to assign relative priorities to migration and foreground I/Os under these conditions is an open problem.
For example, a simple 1-out-of-n scheme may work if the foreground load consists of random I/Os, but may cause a much higher than expected interference if foreground I/Os were highly sequential. Furthermore, any non-adaptive scheme is unlikely to succeed: application behaviors vary greatly over time, and failures and capacity additions occur very frequently in real systems. Fair scheduling based on dynamic priorities has worked reasonably well for CPU cycles, but priority computations remain an ad-hoc craft, and the mechanical properties of disks plus the presence of large caches result in strong nonlinear behaviors that invalidate all but the most sophisticated latency predictions.
Recently, control theory has been explored in several computer system projects. For example, control theory used to develop a feedback control loop to guarantee the desired network packet rate in a distributed visual tracking system. A control theory was applied to analyze a congestion control algorithm on IP routers. While these works apply control theory on computing systems, they focus on managing the network bandwidth, instead of the performance of end servers.
Feedback control architectures have also been developed for Web servers amd e-mail servers. In the area of CPU scheduling, a feedback was developed based on a CPU scheduler that synchronizes the progress of consumers and supplier processes of buffers. Scheduling algorithms based on feedback control were also developed to provide deadline miss ratio guarantees to real-time applications with unpredictable workloads. Although these approaches show clear promise, they do not guarantee I/O latencies to applications, nor do they address the SAN subsystem, which is the focus of an ODMA.
Finally, a feedback-based Web cache manager can also achieve differentiated cache hit ratio by adaptively allocating storage spaces to user classes. However, they also did not address I/O latency or data migration in SANs.
Summary and Conclusions
The focus in this article has been on providing latency guarantees, because the bounds on latency are considerably harder to enforce than bounds on throughput, as a technique that could bound latency would have little difficulty with throughput. The primary beneficiaries of QoS guarantees are customer-facing applications, for which latency is a primary criterion.
The main contribution of this article is a novel, control-theoretic approach to achieving these requirements. So, with the preceding in mind, an ODMA adaptively tries to consume as much as possible of the available system resources left unused by client applications, while statistically avoiding QoS violations. It does so by dynamically adjusting the speed of data migration to maintain the desired QoS goals, while at the same time maximizing the achieved data migration rate by using periodic measurements of the SAN's performance as perceived by the client applications. It guarantees that the average I/O latency throughout the execution of a migration will be bounded by a pre-specified QoS goal formulation. If desired, it could be extended straightforwardly to provide a bound on the number of sampling periods during which the QoS goal formulation was violated (although it did so reasonably and effectively without explicitly including this requirement; and suspected that doing so would reduce the data migration rate achieved) possibly more than was beneficial.
Finally, potential future work items in online data migration performance include a more general implementation that interacts with performance monitoring tools. This also includes developing a low-overhead mechanism for finer-grain control of the migration speed, making the controller self-tuning to handle different categories of workloads, and implementing a new control loop that can simultaneously bound latencies and violation fractions.
[[The preceding article is based on material provided by ongoing research at the Department of Computer Science, University of Virginia and the Storage and Content Distribution Department, Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA 94304 on the "Aqueduct" online data migration tool. Additional information for this article was also based on material contained in the following white paper:
"Aqueduct: online data migration with performance guarantees" by Chenyang Lu, Guillermo A. Alvarez and John Wilkes available at: http://www.hpl.hp.com/personal/John_Wilkes/papers/FAST2002-aqueduct.pdf]
John Vacca is an information technology consultant and internationally known author based in Pomeroy, Ohio. Since 1982 Vacca has authored 39 books and more than 480 articles in the areas of advanced storage, computer security and aerospace technology. Vacca was also a configuration management specialist, computer specialist, and the computer security official for NASA's space station program (Freedom) and the International Space Station Program, from 1988 until his early retirement from NASA in 1995.