Book Excerpt: SAN Backup and Recovery Page 4
By W. Curtis Preston
Levels of Drive Sharing
In addition to accomplishing drive sharing via different methods, backup software companies also differ in how they allow libraries and their drives to be shared. In order to understand what I mean, I must explain a few terms. A main server is the central backup server in any backup configuration. It contains the schedules and indexes for all backups and acts as a central point of control for a group of backup clients. A device server has only tape drives that receive backup data from clients. The main server controls its actions. This is known as a three-tiered backup system, where the main server can tell a backup client to back up to the main server's tape drives or to the tape drives on one of its device servers. Some backup software products don't support device servers and require all backups to be transmitted across the LAN to the main server. This type of product is known as a two-tiered backup system. (A single-tiered backup product would support only backups to a server's own tape drives. An example of such a product would be the software that comes with an inexpensive tape drive.)
- Sharing drives between device servers
- Most LAN-free backup products allow you to share a tape library and its drives only between a single main server and/or device servers under that main server's control. If you have more than one main server in your environment, this product doesn't allow you to share a tape library and its drives between multiple main servers.
- Sharing drives between main servers
- Some LAN-free backup products allow you to share a library and its drives between multiple main servers. Many products that support this functionality do so because they are two-tiered products that don't support device servers, but this isn't always the case.
One of the truly beautiful things about sharing tape drives in a library via a SAN is what happens when it's time to restore. First, consider the parallel study illustrated in Table 4-2. Since only three drives are available to each host, only three drives will be available during the restores as well. Even if your backup software had the ability to read more than three tapes at a time, you wouldn't be able to do so.
However, if you are sharing the drives dynamically, a restore could be given access to all available drives. If your backups occur at night, and most of your restores occur during the day, you would probably be given access to all drives in the library during most restores. Depending on the capabilities of your backup software package, this can drastically increase the speed of your restore.
TIP: Few backup software products can use more drives during a restore than they did during a backup, but that isn't to say that such products don't exist.
Other Ways to Share Tape Drives
As of this writing, there are at least three other ways to share devices without using a SAN. Although this is a chapter about SAN technology, I thought it appropriate to mention them here.
The Network Data Management Protocol (NDMP), originally designed by Network Appliance and PDC/Intelliguard (acquired by Legato) to back up filers, offers another way to share a tape library. There are a few vendors that have tape libraries that can be shared via NDMP.
Here's how it works. First, the vendors designed a small computer system to support the "tape server" functionality of NDMP:
- It can connect to the LAN and use NDMP to receive backups or transmit restores via NDMP.
- It can connect to the SCSI bus of a tape library and transmit to (or receive data from) its tape drives.
They then connect the SCSI bus of this computer to a tape drive in the library, and the Ethernet port of this computer to your LAN. (Current implementations have one of these computers for each tape drive, giving each tape drive its own connection to the LAN.) The function of the computer system is to convert data coming across the LAN in raw TCP/IP format and to write it as SCSI data via a local device driver which understands the drive type. Early implementations are using DLT 8000s and 100-Mb Ethernet, because the speed of a 100-Mb network is well matched to a 10-MB/s DLT. By the time you read this, there will probably be newer implementations that use Gigabit Ethernet and faster tape drives.
Not only does this give you a way to back up your filers; it also offers another option for backing up your distributed machines. As will be discussed in Chapter 7, NDMP v4 supports backing up a standard (non-NDMP) backup client to an NDMP-capable device. Since this NDMP library looks just like another filer to the backup server, you can use it to perform backups of non-NDMP clients with this feature. Since the tape library does the job of tracking which drives are in use and won't allow more than one backup server to use it at a time, you can effectively share this tape library with any backup server on the LAN.
SCSI over IP
SCSI over IP, also known as iSCSI, is a rather new concept when compared to Fibre Channel, but it's gaining a lot of ground. A SCSI device with an iSCSI adapter is accessible to any host on the LAN that also has an iSCSI interface card. Proponents of iSCSI say that Fibre Channel-based SANs are complex, expensive, and built on standards that are still being finalized. They believe that SANs based on iSCSI will solve these problems. This is because Ethernet is an old standard that has evolved slowly in such a way that almost all equipment remains interoperable. SCSI is also another standard that's been around a long time and is a very established protocol.
The one limitation to iSCSI today is that it's easy for a host to saturate a 100-MB/s Fibre Channel pipe, but it's difficult for the same host to saturate a 100-MB/s (1000 Mb or 1 Gb) Ethernet pipe. Therefore, although the architectures offer relatively the same speed, they aren't necessarily equivalent. iSCSI vendors realize this and are therefore have developed network interface cards (NICs) that can communicate "at line speed." One way to do this is to offload the TCP/IP processing from the host and perform it on the NIC itself. Now that vendors have started shipping gigabit NICs that can communicate at line speed, iSCSI stands to make significant inroads into the SAN marketplace. More information about iSCSI can be found in Appendix A.
There are some vendors that provide shared access to SCSI devices via standard parallel SCSI. Since all the LAN-free implementations that I have seen have used Fibre Channel, it's unclear how well such products would be supported by drive-sharing products.
Problems Caused by Reboots
I have heard reports of difficulties caused by an individual host rebooting while a drive on the SAN is in use by another host. On some operating systems, busy drives aren't seen by the operating system, and thus aren't available to use. Even worse, it sometimes causes the available drives' names to be changed, resulting in a completely inoperable set of drives. Some vendors have worked around the missing-drives problems using intelligent SAN routers. Some vendors have worked around name-change problems by tying the virtual name of the drive to the drive's actual serial number. Make sure you talk to your SAN hardware and software vendors about how they handle this situation.
A Variation on the Theme
LAN-free backups solve a lot of problems by removing backup traffic from the LAN. When combined with commercial interfaces to database backup APIs, they can provide a fast way to back up your databases. However, what happens if you have an application that is constantly changing data, and for which there is no API? Similarly, what if you can't afford to purchase the interface to that API from your backup software company? Usually, the only answer is to shut down the application during the entire time of the backup.
One solution to this problem is to use a client-free backup solution, which is covered in the next section. However, client-free backups are expensive, and they are meant to solve more problems (and provide more features) than this. If you don't need all the functionality provided by client-free backups, but you still have the problem described earlier, perhaps what you need is a snapshot. In my book Unix Backup and Recovery, I introduced the concept of snapshots as a feature I'd like to see integrated into commercial backup and recovery products, and a number of them now offer it.
W. Curtis Preston has specialized in designing backup and recovery systems for over eight years, and has designed such systems for many environments, both large and small. The first environment that Curtis was responsible for went from 7 small servers to 250 large servers in just over two years, running Oracle, Informix, and Sybase databases and five versions of Unix. He started managing this environment with homegrown utilities and eventually installed the first of many commercial backup utilities. His passion for backup and recovery began with managing the data growth of this 24x7, mission-critical environment. Having designed backup systems for environments with small budgets, Curtis has developed a number of freely available tools, including ones that perform live backups of Oracle, Informix, and Sybase. He has ported these tools to a number of environments, including Linux, and they are running at companies around the world. Curtis is now the owner of Storage Designs, a consulting company dedicated entirely to selecting, designing, implementing, and auditing storage systems. He is also the webmaster of www.backupcentral.com.
1. This term may be changed in the near future, since iSCSI-based SANs will, of course, use the LAN. But if you create a separate LAN for iSCSI, as many experts are recommending, the backups will not use your production LAN. Therefore, the principle remains the same, and only the implementation changes.
8. Although it's possible that some software products have also implemented a third-party queuing system for the robotic arm as well, I am not aware of any that do this. As long as you have a third-party application controlling access to the tape library and placing tapes into drives that need them, there is no need to share the robot in a SCSI sense.
9. Network Appliance filers appear to act this way, but the WAFL filesystem is quite a bit different. They store a "before" image of every block that is changed every time they sync the data from NVRAM to disk. Each time they perform a sync operation, they leave a pointer to the previous state of the filesystem. A Network Appliance snapshot, then, is simply a reference to that pointer. Please consult your Network Appliance documentation for details.
12. There are vendors that are shipping gigabit network cards that offload the TCP/IP processing from the server. They make LAN-based backups easier, but LAN-free backups are still better because of the design of most backup software packages.