dcsimg

Going the Distance for Disaster Recovery

Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure

SHARE
Share it on Twitter  
Share it on Facebook  
Share it on Google+
Share it on Linked in  
Email  

As a subset of more comprehensive business continuance strategies, disaster recovery (DR) focuses on data integrity and availability when an outage occurs. The potential sources of disruption or outage directly shape our view of what steps must be taken to resume business operations.

Years ago, for example, local tape backup with off-site vaulting was considered adequate protection against outages. If a storage system failed, tapes would simply be reloaded for data restoration to disk. For higher availability, it was also common practice to mirror data between two local disk systems. If the primary failed, operations would be shifted to the secondary array.

These tactics assumed that a disruption would occur from fairly innocuous sources: a primary array taken off-line for maintenance, an unexpected hardware failure, operator error, or other local occurrence. No one was thinking that entire metropolitan areas or regions would suffer disruption due to massive power failures or terrorist attacks. Unfortunately, that age of innocence is now ancient history, and the hard new reality is forcing a fundamental reexamination of disaster recovery strategies.

The sphere of potential disruption has now expanded well beyond the local data center, beyond metropolitan boundaries, and beyond entire regional geographies. It is no longer sufficient to rely on local tape backup, local disk mirroring, site-to-site replication within a city, or data replication within a region.

Although many financial institutions in New York City still rely on recovery sites in New Jersey, the combination of religious extremism and anthrax terror of 9-11, as well as the more recent Northeast power blackout, demonstrated that the 20 miles across the Hudson River does not ensure the well-being of corporate data. Now, companies must think about securing their data assets far from their primary production centers, spanning hundreds or thousands of miles to safer havens.

The Need for Long-Distance DR Strategies

Previously, inherent distance limitations of Fibre Channel created a barrier to long distance data replication for disaster recovery. Fibre Channel fabrics connected by dark fiber and DWDM (dense wave division multiplexing) can only span metropolitan distances, typically well under 100 miles. This made it difficult to conceive of more comprehensive DR solutions that would cross regional or national boundaries.

New IP SAN technology, however, has abolished those limits, with demonstrated connectivity over thousands of miles. In addition, some IP SAN products provide multi-point routing capability, so that multiple regional data centers can be integrated in a single DR configuration. These new solutions open the door to enterprise-wide storage strategies, including global connectivity for multi-national companies that must secure highly dispersed data assets.

Removing obstacles to infrastructure expansion, however, is only part of the equation. DR applications such as synchronous data replication are sensitive to latency, which is tied directly to distance. Speed of light propagation dictates about one millisecond of latency for every 100 miles. A thousand mile span between primary and DR sites, for example, would inject roughly 10 milliseconds of latency each way, or 20 milliseconds latency round trip.

For synchronous applications, vendors typically recommend a maximum of 150-200 miles between sites. In practice, customers have pushed synchronous data replication more than twice that distance, despite lack of support by the supplying vendor.

Asynchronous data replication, in contrast, is highly tolerant of latency and can be driven across thousands of miles. The tradeoff between distance and data integrity in this case is the possibility that a transaction at a production site may be lost if a failure occurs. For highly mission-critical applications, synchronous replication is always preferred, even at the sacrifice of distance. One compromise is to implement synchronous data replication to a regional facility, followed by asynchronous replication between the regional facility and some more distant DR site.

Page 2: Complimentary New Technologies for DR

Complimentary New Technologies for DR

Complimentary new technologies for disaster recovery include native Fibre Channel over SONET, Fibre Channel over IP (FCIP) , and SAN Routing using the Internet Fibre Channel Protocol (iFCP). As a testimony to the commitment by customers to upgrade their disaster recovery plans, even established communications vendors such as Nortel are providing storage-specific solutions to accommodate Fibre Channel.

Similar to dark fiber/DWDM solutions, Fibre Channel over SONET and FCIP extend a single logical fabric over distance, but can drive much longer distances than dark fiber/DWDM links. SAN Routing with iFCP adds fault isolation and autonomous areas to DR scenarios, so that the production facility and DR site remain separate SANs and avoid exposure to potential fabric reconfigurations or state change broadcasts that affect extended fabrics.

Aside from technical concerns, there is also the very substantial issue of cost. IP wide area network services are more affordable than dedicated dark fiber and DWDM, but still impose a recurring monthly cost for what amounts to corporate data insurance. Customers today may select from a broad menu of IP service offerings, from link speeds as low as 1.54 Mbps (T1) to 2.5 Gbps (OC-48) with various service level agreements.

For storage traffic over IP, the recommended minimum bandwidth is 45 Mbps (T3), or roughly 5.6 MBps in storage vernacular. At full bandwidth saturation, a T3 link would carry about 700 MB per hour. While this may be more than adequate for many synchronous data replication applications, it has the added benefit of creating a wider backup window for remote tape streaming.

Driving More Storage Traffic over Longer Distances in Less Time

To facilitate use of lower cost and lower bandwidth IP links and still meet the requirements of storage transactions, vendors have created innovative technologies to drive more storage traffic over longer distances in less time. Data compression, for example, can more than double the payload delivery across a given link speed, enabling use of more affordable T3 services instead of OC-3 (155 Mbps) or higher links.

Algorithms such as Fast Write (originally developed by Nishan Systems and now provided by McDATA) can deliver a tenfold increase in payload delivery for Fibre Channel-originated data simply by eliminating most of the SCSI transaction overhead that would otherwise occur across distance. In addition, some bandwidth management techniques provide quality of service guarantees so that multiple storage applications can be run concurrently over the same IP infrastructure. These enhanced transport facilities optimize link utilization while providing more flexibility in choice for IP network services.

Interestingly, although European businesses have been in the vanguard of DR implementation, North American companies are breaking more distance barriers with DR. TELUS, for example, is a Canadian IP services provider that recently implemented a DR offering for its clients that spans between Toronto and Calgary, practically across the continent.

Global 1000 companies are now implementing much further reaching DR strategies, with links between Europe, North America and Asia. With new enabling IP storage technologies in hand, customers can focus on their enterprise-wide DR requirements and size their data availability strategies accordingly.


Tom Clark
Director, SAN Technology, McDATA Corporation
Author: Designing Storage Area Networks Second Edition (2003) (available at Amazon.com), IP SANs (2002) (also available at Amazon.com).

» See All Articles by Columnist Tom Clark

Submit a Comment

Loading Comments...

NewsletterSTORAGE DAILY

Want the latest storage insights?