Book Excerpt: SAN Backup and Recovery Page 7
By W. Curtis Preston
How Client-Free Backups Work
As you can see in Figure 4-5, there is SAN-connected storage that is available to at least two systems: the data server and a backup server. The storage consists of a primary disk set and a backup mirror, which is simply an additional copy of the primary disk set. (In the figure, the primary disk set is mirrored, as represented by the M1/M2.) Note that the SAN-connected storage is depicted as a single, large multihosted storage array with an internal, prewired SAN. The reason for this is that all early implementations of client-free backups have been with such storage arrays. As discussed earlier, there are now enterprise volume managers that will make it possible to do this with JBOD on a SAN, but I use the concept of a large storage array in this example because it's the solution that's available from the most vendors--even if it's the most expensive.
The normal status quo of the backup mirror is that it's left split from the primary disk set. Why it's left split will become clear later in this section.
At the appropriate time, the backup application performs a series of tasks:
- Backup Server A, the main backup server, tells Backup Server B to begin the backup.
- Unmounts the backup mirror (if it's a mounted filesystem) from Backup Server B.
- Exports the volumes from the OS' volume manager on Backup Server B.
- Establishes the backup mirror (i.e., "reconnects" it to the primary disk set) by running commands on Backup Server B that communicate with the storage array.
- Backup Server B monitors the backup mirror, waiting for the establish to complete.
- Backup Server B waits for the appropriate time to split off the backup mirror.
- Backup Server B tells Data Server to put the application into backup mode.
- Backup Server B splits the backup mirror.
- Backup Server B tells Data Server to take the application out of backup mode.
- Backup Server B imports the volumes found on the backup mirror.
- Backup Server B mounts any filesystems found on the backup mirror.
- Backup Server B now performs the backup of the backup mirror via its I/O backplane, instead of the data server's backplane.
- After the backup, the filesystems are left mounted and imported on Backup Server B.
These tasks mean that the backup data is sent via the SAN to the backup server and doesn't travel through the client at all, which is why it's referred to as client-free backups. Again, some vendors refer to this as server-free backups, but I reserve that term for a specific type of backup that will be covered in the section, "Server-Free Backups."
You may find yourself asking a few questions:
- How do you put the application into backup mode?
- How do you import another machine's volumes and filesystems?
- Why is the backup mirror left split and mounted most of the time?
- This sounds really complicated. Is it?
Before explaining this process in detail, let's examine a few things that make client-free backups possible:
- You must be able to put the application into backup mode
- You are going to split off a mirror of the application's disk, and it has no idea you're going to do that. The best way to do this is to stop all I/O operations to the disk during the split. This is usually done by shutting down the application that is using the disk. However, some database products, such as Oracle and Informix, allow you to put their databases into a backup-friendly state without shutting them down. This works fine as well. Informix actually freezes all commits, and Oracle does some extra logging. If you're backing up a file server, you need to stop writes to the files during the time of the split. Otherwise, the filesystem can be corrupted during the process. If you perform an online, split-mirror backup of a SQL Server database, SQL Server's recovery mechanism is supposed to recover the database back to the last checkpoint. However, any transactions that aren't in the online transaction log are lost. (In other words, you can't issue a load transaction command after a split mirror recovery). Microsoft Exchange must normally be shut down during the split.
- This isn't to say that a backup and recovery software company can't write an API that communicates with the SQL Server and Exchange APIs. This software product could tell SQL Server and Exchange that it performed a traditional backup, when in reality it's performing a split-mirror backup. In fact, this is already being done for Exchange in the NAS world with snapshots, which act like split mirrors as far as the application is concerned.
- You have to be able to back up the application's data via a third-party application
- If the application is a file server, this is usually not a problem. However, most modern databases are backed up by passing a stream of data to the backup application via a special API. Perhaps two examples will illustrate what I mean.
- The standard way to back up Oracle is to use the RMAN (Recovery Manager) utility. Once configured, your backup software automatically talks to Oracle's RMAN API, which then passes streams of data back to the backup application. However, Oracle also allows you start sqlplus (a command-line utility you can script), issue the command alter tablespace tablespace begin backup and then back up that tablespace's datafiles in any way you want. When it's time to recover, you shut down the database, restore those datafiles from backup, then use Oracle's archived redo logs to redo any transactions that have occurred since the backup.
- Informix now offers similar functionality, but it didn't always do so. Prior to the creation of the onbar utility, you were forced to use the ontape utility to back up Informix. This backed up a database's datafiles or logical logs (transaction logs) to tape. If you didn't back up the datafiles with ontape, you couldn't restore the database to a point in time. For example, suppose you shut down the database and used a third-party backup tool to back up the database's datafiles. You then started up the database and ran it a while, making and recording transactions. If you then shut down the database, using your third-party tool to restore the datafiles, there would be no way within Informix to redo the transactions that occurred since the backup was taken. Therefore, third-party backups with these versions of Informix were useless, preventing you from doing client-free backups of these databases. Now, with the onbar utility, Informix supports what it calls an external backup. As you will see later, this is the tool that you now use to perform client-free backups of Informix.
- Although Microsoft Exchange does have transaction logs, the only way to back them up is via the Microsoft-provided tools and APIs. Unless a backup product writes an application that speaks to Microsoft's API and tricks it into thinking that the split mirror backup is a "normal" backup, there is no way to perform an online third-party backup of it. However, you can perform an offline third-party backup of Exchange by shutting down the Exchange services prior to splitting the mirror. You can then restore the Exchange server back to the point the mirror was split, but you lose any messages that have been sent since then. (This is why there are few people performing client-free backups of Exchange.)
- Microsoft's SQL Server has a sophisticated recovery mechanism that allows you to perform both online and offline backups, with advantages and disadvantages of both. If you split the mirror while SQL Server is online, you shouldn't suffer a service interruption because of backup, but the recovery process that SQL Server must perform when a restored database is restarted will take much longer. If you shut down SQL Server before splitting the mirror, SQL Server doesn't need to recover the database back to the last checkpoint before it can start replaying transaction logs. However, you will obviously suffer a service interruption whenever you perform a backup. Whether you shut down the database or not, you can't use the dump transaction and load transaction commands in conjunction with a split mirror backup of SQL Server.
- You have to be able to establish to (and split the mirror from) the primary disk set
- This functionality is provided by software on the backup server or data server that communicates with the volume manager of the storage array. With large, multihosted arrays (e.g., Compaq's Enterprise Storage Array, EMC Symmetrix, and Hitachi 9000 series), the client-side software is communicating with the built-in, hardware RAID controller. (An example of this is provided later in this chapter.) Other solutions use software RAID; in this situation, the client-side software is talking to the host that is managing the volumes.
- You must be able to see the backup mirror from the backup server
- Once you split the backup mirror from the primary disk set, you have to be able to see its disk on the backup server. If you can't, the entire exercise is pointless. This is where Windows NT has typically had a problem. Once the mirror has been split, the associated drives just "appear" on the SCSI bus--roughly the equivalent to plugging in new disk drives without rebooting or running the Disk Manager or Computer Management GUIs. However, Windows 2000 now uses a light version of the Veritas Volume Manager. You can purchase a full-priced version for both NT and Windows 2000, which comes with command line utilities. By the time you read this, the full version of Volume Manager should support the necessary functionality. In fact, Veritas will reportedly have a script included with the product that is specifically designed to help automate client-free backups.
- You normally have to have the same operating system on the data server and the backup server
- The reason for this is that you are going to be reading one host's disks on another host. If Backup Server B in Figure 4-5 is to back up the data server's disks, it needs to understand the disk labels, any volume manager disk groups and volumes, and any filesystems that may reside on the disks. With few exceptions, this normally means that both servers must be running the same operating system. At least one client-free backup software package has gotten around this limitation by writing custom software that can understand volume managers and filesystems from other operating systems. But, for the most part, the data server and backup server need to be the same operating system. This can be true even if you aren't using a volume manager or filesystem and are backing up raw disks. For example, a Solaris system can't perform I/O operations on a disk that has an NT label on it.
Before continuing this explanation of client-free backups, I must define some terms that will be used in the following example. Please understand that there are multiple vendors that offer this functionality, and they all use different terminology and all work slightly differently.
TIP: The lists that follow in the rest of this chapter give examples for Compaq, EMC, and Hitachi using Exchange, Informix, Oracle, and SQL Server databases, and the Veritas Volume Manager and File System. These examples are for clarification only and aren't meant to imply that this functionality is available only on these platforms or to indicate an endorsement of these products. These vendors are listed in alphabetical order throughout, and so no inferences should be made as to the author's preference for one vendor over another. There are several multihosted storage array vendors and several database vendors. Consult your vendor for specifics on how this works on your platform.
- Primary disk set
- The term primary disk set refers to the set of disks that hold the primary copy of the data server's data. What I refer to as a disk set are probably several independent disks or can be one large RAID volume. Whether the primary disk set uses RAID 0, 1, 0+1, 1+0, or 5 is irrelevant. I'm simply referring to the entire set of disks that contain the primary copy of the data server's data. Compaq calls this the primary mirror, EMC calls it the standard, and Hitachi, the primary volume or P-VOL.
- Backup mirror
- The backup mirror device is another set of disks specifically allocated for backup mirror use. When people say "the backup mirror," they are typically referring to a set of backup mirror disks that are associated with a primary disk set. It's often referred to as a "third mirror," because the primary disk set is often mirrored. When synchronized with the primary disk set, this set of disks then represents a third copy of the data--thus the term third mirror. However, not all vendors use mirroring for primary disk sets, so I've chosen to use the term backup mirror instead. Compaq and EMC both call this a BCV, or business continuity volume; Hitachi calls it the secondary volume or S-VOL.
- Backup mirror application
- This is a piece of software that synchronizes and splits backup mirror devices based on commands that are issued by the client-side version of the software running on the data or backup server. That is, you run the backup mirror application on your Unix or NT system, and it talks to the disk array and tells it what to do. Compaq has a few ways to do this. The more established way is to use the Enterprise Volume Manager GUI and the Compaq Batch Scheduler. The Batch Scheduler is a web-based GUI that is accessible via any web browser and can automate the creation of BCVs. For scripting, however, you should use the SANWorks Command Scripter, which allows direct command-line access to the StorageWorks controllers. EMC's application for this is Timefinder; Hitachi's is Shadowimage.
- To establish a backup mirror is to tell the primary disk set to copy its data over to the backup mirror, thus synchronizing it with the primary disk set. Many backup mirror applications offer an incremental establish, which copies only the sectors that have changed since the last establish. Some refer to this as silvering the mirror, a reference to the silver that is put on the back of a "real" mirror.
- When you split a backup mirror, you tell the disk array to break the link between the primary disk set and the backup mirror. This is typically done to back up the backup mirror. Once it's finished, you have a complete copy of the primary disk set on another set of devices, and those devices become visible on the backup server. This is also called breaking the mirror.
- To restore the backup mirror is to copy the backup mirror to the primary disk set. Once the command to do this is issued, the restore usually appears to have been completed instantaneously. As will be explained in more detail later, requests for data that has not yet been restored is redirected to the mirror.
- Main server
- As discussed earlier in this chapter, a main server is the server in a backup environment that schedules all backups and stores in the database information about what backups went to what tape. It may or may not have tape drives connected to it.
- Device server
- A device server, as discussed earlier, is a server that has tape drives connected to it. The backups that go to these tape drives are scheduled by the main server. The main server also keeps track of what files went to what tapes on this device server.
W. Curtis Preston has specialized in designing backup and recovery systems for over eight years, and has designed such systems for many environments, both large and small. The first environment that Curtis was responsible for went from 7 small servers to 250 large servers in just over two years, running Oracle, Informix, and Sybase databases and five versions of Unix. He started managing this environment with homegrown utilities and eventually installed the first of many commercial backup utilities. His passion for backup and recovery began with managing the data growth of this 24x7, mission-critical environment. Having designed backup systems for environments with small budgets, Curtis has developed a number of freely available tools, including ones that perform live backups of Oracle, Informix, and Sybase. He has ported these tools to a number of environments, including Linux, and they are running at companies around the world. Curtis is now the owner of Storage Designs, a consulting company dedicated entirely to selecting, designing, implementing, and auditing storage systems. He is also the webmaster of www.backupcentral.com.
1. This term may be changed in the near future, since iSCSI-based SANs will, of course, use the LAN. But if you create a separate LAN for iSCSI, as many experts are recommending, the backups will not use your production LAN. Therefore, the principle remains the same, and only the implementation changes.
8. Although it's possible that some software products have also implemented a third-party queuing system for the robotic arm as well, I am not aware of any that do this. As long as you have a third-party application controlling access to the tape library and placing tapes into drives that need them, there is no need to share the robot in a SCSI sense.
9. Network Appliance filers appear to act this way, but the WAFL filesystem is quite a bit different. They store a "before" image of every block that is changed every time they sync the data from NVRAM to disk. Each time they perform a sync operation, they leave a pointer to the previous state of the filesystem. A Network Appliance snapshot, then, is simply a reference to that pointer. Please consult your Network Appliance documentation for details.
12. There are vendors that are shipping gigabit network cards that offload the TCP/IP processing from the server. They make LAN-based backups easier, but LAN-free backups are still better because of the design of most backup software packages.