Book Excerpt: SAN Backup and Recovery Page 11
By W. Curtis Preston
Nobody cares if you can back up--only if you can recover. How do you recover from such a convoluted setup? This is really where the beauty of having the backup mirror stay split and mounted comes in. Unless the entire storage array is destroyed, it's available for an immediate restore. However, let's start with the worst-case scenario: the entire storage array was destroyed and has been replaced.
As shown in Figure 4-10, the worst possible thing that can happen is if something catastrophic happens to the storage array, destroying the primary disk set, backup mirror, and transaction log backup disk. Because of the extreme amount of proactive monitoring most of the enterprise storage arrays have, this is rather unlikely to happen. In fact, with the backup mirror left split most of the time, about the only probable way for both the backup mirror and the primary disk set to be damaged is complete destruction of the physical unit, such as by fire. The first step in this recovery is that the storage array must, of course, be repaired or reinstalled.
Once the storage array is fully functional, you need to restore the backup mirror from tape as shown in Figure 4-11 (1a-b). While this is going on, the transaction log backups can also be restored from tape to disk (2a, b, and c). This allows them to be immediately available once the backup mirror is restored.
Once the recovery of the backup mirror has been completed from tape, we can move on to the next step. (Of course, if you don't have to recover the backup mirror from tape, you can start with the next step.)
Recovering after a tape recovery or if you lose the primary disk set and not the backup mirror
This recovery is much more likely to occur and will occur under the three following circumstances:
- If you lost the entire storage array, repaired it, and recovered the backup mirror from tape (as discussed in the previous section)
- If you lost both sides of the primary disk set but did not lose the backup mirror
- If the database or filesystem residing on the primary disk set was logically corrupted but the backup mirror was split at the time
The process now is to recover the primary disk set from the backup mirror, replay the transaction logs, and bring the database online.
You now need to tell the storage array to copy the data from the backup mirror back to the primary disk set. Here's how to do so:
- By running the Compaq EVM GUI you can easily connect the third mirror back to the primary mirror and restore the data to the first mirror set. If you want to do this via the command line, use the unmirror command to turn off the primary mirrors. Then use the mirror command to create a one-way mirror of each disk in the BCV, followed by set mirrorset-name nopolicy, and set mirrorset-name members=[n+2]. Now issue the command set mirrorset-name replace=disk-name for each disk from the primary mirror. This places the primary disks as additional disks in the temporary mirrorsets created for recovery, causing the data on the backup mirror disks to be copied to the primary disks. Then run the commands add unit unit-name disk-name, set unit unit-name disable_access_path=all, and set unit unit-name enable_access_path=(primary-server-name) to grant the primary server access to the new mirror. Once the copy had been completed, you can remove the additional mirror with the reduce mirror command.
- To recover the primary disk set (standard) from the backup mirror (BCV), you must tell the backup mirror application to do so. With EMC, you issue the command symbcv restore -g device_group, which begins copying the BCV to the standard as shown in Figure 4-12. With EMC, the moment this command is executed, the primary disk set appears to have been restored and is immediately accessible. How does this work? Some people seem to think that the backup mirror is simply mounted as the primary disk set during a restore. That isn't what happens at all. The backup mirror isn't visible to the data server at any time, so this isn't even possible.
- Take a look at Figure 4-12 and assume that some time has passed. The data in block "A" has been copied to block "B," (2) but the rest of the data on the BCV has not yet been copied. If the application asks for the data that has been restored from the BCV, it will receive it from the BCV (3). If the application requests data that has not yet been copied (4), Timefinder forwards the request to the BCV (5). The data is then presented to the application as if it was already on the standard.
- To recover the primary mirror from the S-VOL, tell the Shadowimage application to do so. With HDS, issue the command pairresync -restore -g device_group, which begins copying the S-VOL to the P-VOL. In this way, HDS and EMC are similar; the moment this command is executed, the primary mirror appears to have been restored and is immediately accessible.
Since the disk-based transaction log dumps were recovered in a previous step, you can now start the transaction log restore from the disk-based transaction log backups (see Figure 4-13).
Here's an overview of how to do this for Exchange, Informix, Oracle, and SQL Server:
- Since you can't restore any transaction logs, you will simply need to restart the Exchange services after the restore. You can do this via the Exchange GUI.
- First, you need to tell Informix you've performed what it calls an "external recovery" of the chunks. To do this, issue the command onbar -r -e -p, which tells Informix to examine the chunks to make sure they've been recovered and to know which logical log it needs to start with. Once this completes successfully, you can tell Informix to perform the logical log recovery with the command onbar -r -l. (You can perform both steps with one command (onbar -r -e), but I prefer to perform each step separately. The -p option of the onbar -r -e -p command tells it to perform only the physical recovery.)
- Oracle recoveries can be quite complex and are covered in detail elsewhere in this book. Assuming that you have done a restore of one or more datafiles from the backup mirror to the primary disk set, the following commands replay the redo logs:
- SQL Server
- You can't recover transactions that have occurred since the split-mirror backup was taken. If you need point-in-time recovery, use another backup method.
$ sqlplus /nolog
> connect /as sysdba
> startup mount ;
> recover database ;
W. Curtis Preston has specialized in designing backup and recovery systems for over eight years, and has designed such systems for many environments, both large and small. The first environment that Curtis was responsible for went from 7 small servers to 250 large servers in just over two years, running Oracle, Informix, and Sybase databases and five versions of Unix. He started managing this environment with homegrown utilities and eventually installed the first of many commercial backup utilities. His passion for backup and recovery began with managing the data growth of this 24x7, mission-critical environment. Having designed backup systems for environments with small budgets, Curtis has developed a number of freely available tools, including ones that perform live backups of Oracle, Informix, and Sybase. He has ported these tools to a number of environments, including Linux, and they are running at companies around the world. Curtis is now the owner of Storage Designs, a consulting company dedicated entirely to selecting, designing, implementing, and auditing storage systems. He is also the webmaster of www.backupcentral.com.
1. This term may be changed in the near future, since iSCSI-based SANs will, of course, use the LAN. But if you create a separate LAN for iSCSI, as many experts are recommending, the backups will not use your production LAN. Therefore, the principle remains the same, and only the implementation changes.
8. Although it's possible that some software products have also implemented a third-party queuing system for the robotic arm as well, I am not aware of any that do this. As long as you have a third-party application controlling access to the tape library and placing tapes into drives that need them, there is no need to share the robot in a SCSI sense.
9. Network Appliance filers appear to act this way, but the WAFL filesystem is quite a bit different. They store a "before" image of every block that is changed every time they sync the data from NVRAM to disk. Each time they perform a sync operation, they leave a pointer to the previous state of the filesystem. A Network Appliance snapshot, then, is simply a reference to that pointer. Please consult your Network Appliance documentation for details.
12. There are vendors that are shipping gigabit network cards that offload the TCP/IP processing from the server. They make LAN-based backups easier, but LAN-free backups are still better because of the design of most backup software packages.