Farewell to Data Loss: Understanding Data Replication Page 2


Want the latest storage insights?

Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure

One alternative to avoid the round-trip delay overhead associated with synchronous mirroring or two-phase commit is to buffer then transmit the changes as fast as available bandwidth allows. Providing the available bandwidth is equal to or greater than the rate of data change, data will be transmitted and applied nearly instantaneously providing "near zero" data loss. With this buffering alternative, if the rate of data change temporarily exceeds available bandwidth, seconds or even minutes of changes could be queued, waiting to be transmitted. Since the changes are still on-site, they could be lost in the event of a disastrous failure. Losing the changes would be impossible with a synchronous system since the transactions would never occur because to allow the mirroring to keep pace everything would have been slowed down to the rate of data transmission.

Enter Asynchronous Replication Technology

Past replication technologies worked either within the application (such as SQL transaction replication), where the level of protection was limited to a single application engine and typically caused overhead to the production environment, or at the hardware layer, which often caused latency to the production disk and/or significant and cost-prohibitive wide area network (WAN) usage. Today, asynchronous replication technologies capture file system changes within the operating system, eradicating the aforementioned challenges.

Asynchronous replication offers the advantage of significantly reducing the need for network bandwidth, particularly for large files such as databases and activity logs. It is also quite flexible, allowing users to continuous replicate only the files or directories they deem business critical and worthy of immediate recovery. This approach significantly reduces the cost of off-site protection, compared with disk mirroring applications that take an "all or nothing" approach.

Asynchronous replication captures changes to any files managed by the server Operating System (OS) at a byte level by installing a File-System Filter Driver, which filters all transactions sent to the file system. Through a few simple rules (e.g. "ignore reads"), the filter driver captures a copy of each transaction and sends it to a system service or daemon. The system service or daemon then transmits it via TCP/IP to the target server.

Specifically, the data flows from the application layer, the software located in Layer 7 of the network, to the File System in virtually "real time." Next, data moves to the hardware as the storage solution is ready to begin transmitting across the network as bandwidth becomes available. Irrespective of which application is creating the data change (i.e. Oracle, SQL, Exchange, Web-Services or File-Sharing), the file system write appears the same when the OS views it. This approach ensures that the data replication is completely independent of the application.

Replication solutions are generally hardware independent, so it does not matter whether a Windows 2000 operating system is storing data on a SAN or its own storage drives. While the asynchronous nature of replication may lead one to think the replicated data is not as current as the production data, this is inaccurate. In many environments, particularly large databases, the I/O demands on the production disk are significantly higher in reads than writes. As a result, the small percentage of write I/O is actually replicated with insignificant latency to the target. However in the limited case that there is a constant flow of writes to the production disks and the amount of actual bytes changed is greater than the bandwidth of the connecting pipe, the replicated data may be a few minutes behind the production data.

Submit a Comment


People are discussing this article with 0 comment(s)