Book Excerpt: SAN Backup and Recovery Page 6


Want the latest storage insights?

Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure

By W. Curtis Preston

Application impact

You'd think that backing up data to locally attached tape drives would present a minimal impact to the application. It certainly creates much less of a load than typical LAN-based backups. In reality, however, the amount of throughput required to complete the backup within an acceptable window can sometimes create quite a load on the server, robbing precious resources needed by the application the server is running. The degree to which the application is affected depends on certain factors:

  • Are the size and computing power of the server based only on the needs of the "primary" application, or are the needs of the backup and recovery application also taken into account? It's often possible to build a server that is powerful enough that the primary application isn't affected by the demands of the backup and recovery application, but only if both applications are taken into account when building the server. This is, however, often not done.
  • How much data needs to be transferred from online storage (i.e., disk) to offline storage (i.e., tape) each night? This affects the length of the impact.
  • What are the I/O capabilities of the server's backplane? Some server designs do a good job of computing but a poor job of transferring large amounts of data.
  • How much memory does the backup application require?
  • Can the primary application be backed up online, or does it need to be completely shut down during backups?
  • How busy is the application during the backup window? Is this application being accessed 24 × 7, or is it not needed while backups are running?

Please notice that the last question asked about when the application is needed--not when it's being used. The reason the question is worded this way is that too many businesses have gotten used to systems that aren't available during the backup window. They have grown to expect this, so they simply don't try to use it at that time. They would use it if they could, but they can't--so they don't. The question is, "Would you like to be able to access your system 24 × 7?" If the answer is yes, you need to design a backup and recovery system that creates minimal impact on the application.

Almost all applications are impacted in some way during LAN-based or LAN-free backups. File servers take longer to process file requests. If you have a database and can perform backups with the database running, your database may take longer to process queries and commits. If your database application requires you to shut down the database to perform a backup, the impact on users is much greater.

Whether you are slowing down file or database services, or you are completely halting all database activity, it will be for some period of time. The duration of this period is determined by four factors:

  • How much data do you have to back up?
  • How many offline storage devices do you have available?
  • How much can your backup software take advantage of these devices?
  • How well can your server handle the load of moving data from point A to point B?

Recovery speed

This is the only reason you are backing up, right? Many people fail to take recovery speed into consideration when designing a backup and recovery system when they should be doing almost the opposite. They should design a backup and recovery system in such a way that it can recover the system within an acceptable window. In almost every case, this also results in a system that can back up the system within an acceptable window.

If your backup system is based on moving data from disk to tape, and your recovery system is based on moving data from tape to disk, the recovery time is always a factor of the questions in the previous section. They boil down to two basic questions: how much data do you have to move, and what resources are available to move it?

No other way?

Of course applications are affected during backups. Recovery takes as long, if not longer, than the backup. There's simply no way to get around this, right? That was the correct answer up until just recently; however, client-free backups have changed the rules.

What if there was a way you could back up a given server's data with almost no impact to the application? If there were any impact, it would last for only a few seconds. What if you could recover a multi-terabyte database instantaneously? Wouldn't that be wonderful? That's what client-free backups can do for you.

Client-Free Backups

Performing client-free backup and recovery requires the coordination of several steps across at least two hosts. At one point in time, none of the popular commercial backup and recovery applications had software that could automate all the steps without requiring custom scripting by the administrator. All early implementations of client-free backups involved a significant amount of scripting on the part of the administrator, and almost all early implementations of client-free backups were on Unix servers. This was for several reasons, the first of which was demand. Many people had really large, multi-terabyte Unix servers that qualified as candidates for client-free backups. This led to a lot of cooperation between the storage array vendors and the Unix vendors, which led to commands that could run on a Unix system and accomplish the tasks required to make client-free backups possible. Since many of these tasks required steps to be coordinated on multiple computers, the rsh and ssh capabilities of Unix came in handy.

Since NT systems lacked integrated, advanced scripting support, and communications between NT machines were easy for administrators to script, it wasn't simple to design a scripted solution for NT client-free backups. (As you will see later in this section, another key component that was missing was the ability to mount brand new drive letters from the command line.) This, combined with the fact that NT machines tended to use less storage than their monolithic Unix counterparts, meant that there were not a lot of early implementations of client-free backup on NT. However, things have changed in recent years. It isn't uncommon to find very large Windows machines. (I personally have seen one approaching a terabyte.) Scripting and intermachine communication has improved in recent years, but the limitation of not being able to mount drives via the command line has existed until just recently.[10] Therefore, it's good that a few commercial backup software companies are beginning to release client-free backup software that includes the Windows platform. Those of us with very large Windows machines can finally take advantage of this technology.

Windows isn't the only platform for which commercial client-free applications are being written. As of this writing, I am aware of several products (that are either in beta or have just been released) that will provide integrated client-free backup and recovery functionality for at least four versions of Unix (AIX, HP-UX, Solaris, and Tru64).

The next section attempts to explain all the steps a client-free backup system must complete. These steps can be scripted, or they can be managed by an integrated commercial client-free backup software package. Hopefully, by reading the steps in detail, you will have a greater appreciation of the functionality client-free backups provide, as well as the complexity of the application that provide them.

Click here to buy book

Building SANs with Brocade Fabric Switches

W. Curtis Preston has specialized in designing backup and recovery systems for over eight years, and has designed such systems for many environments, both large and small. The first environment that Curtis was responsible for went from 7 small servers to 250 large servers in just over two years, running Oracle, Informix, and Sybase databases and five versions of Unix. He started managing this environment with homegrown utilities and eventually installed the first of many commercial backup utilities. His passion for backup and recovery began with managing the data growth of this 24x7, mission-critical environment. Having designed backup systems for environments with small budgets, Curtis has developed a number of freely available tools, including ones that perform live backups of Oracle, Informix, and Sybase. He has ported these tools to a number of environments, including Linux, and they are running at companies around the world. Curtis is now the owner of Storage Designs, a consulting company dedicated entirely to selecting, designing, implementing, and auditing storage systems. He is also the webmaster of www.backupcentral.com.


1. This term may be changed in the near future, since iSCSI-based SANs will, of course, use the LAN. But if you create a separate LAN for iSCSI, as many experts are recommending, the backups will not use your production LAN. Therefore, the principle remains the same, and only the implementation changes.

2. As mentioned later in this chapter, SCSI devices can be connected to more than one host, but it can be troublesome.

3. This is actually a high rate of change, but it helps prove the point. Even with a rate of change this high, the drives still go unused the majority of the time.

4. 1.575 TB ÷ 8 hours ÷ 60 minutes ÷ 60 seconds = 54.6 MB/s

5. There are several tape drives capable of these backup speeds, including AIT-3, LTO, Mammoth, Super DLT, 3590, 9840, and DTF.

6. 20 minutes × 24 hosts = 480 minutes, or 8 hours

7. These are Unix prices. Obviously, Windows-based cards cost much less.

8. Although it's possible that some software products have also implemented a third-party queuing system for the robotic arm as well, I am not aware of any that do this. As long as you have a third-party application controlling access to the tape library and placing tapes into drives that need them, there is no need to share the robot in a SCSI sense.

9. Network Appliance filers appear to act this way, but the WAFL filesystem is quite a bit different. They store a "before" image of every block that is changed every time they sync the data from NVRAM to disk. Each time they perform a sync operation, they leave a pointer to the previous state of the filesystem. A Network Appliance snapshot, then, is simply a reference to that pointer. Please consult your Network Appliance documentation for details.

10. It was the Microsoft's partnership with Veritas that finally made this a reality. The volume manager for Windows 2000 is a "lite" version of Veritas Volume Manager.

11. Prior to 9i, this was done with the suvrmgr command, but this command has been removed from 9i.

12. There are vendors that are shipping gigabit network cards that offload the TCP/IP processing from the server. They make LAN-based backups easier, but LAN-free backups are still better because of the design of most backup software packages.

13. It's not quite 100%, since the second stripe doesn't have to be a RAID 5 set. If it were simply a RAID 0 set, you'd need about 90% more disk than you already have.

14. If your backup software supports library sharing.

Submit a Comment


People are discussing this article with 0 comment(s)