Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
By W. Curtis Preston
Each solution requires some sort of HBA on each client. Since Fibre Channel HBAs are more expensive, we will use a price of $2,000 for the Fibre Channel HBAs, and $1,000 for the SCSI HBAs. We will even assume you can obtain SCSI HBAs that have two SCSI buses per card. The SCSI solution requires 100 tape drives, and the SAN solution requires five tape drives. (We will use UHrium LTO drives for this example, which have a native transfer speed of 15 MB/s and a discounted price of $9,000.) Since the library needs to store at least one copy of each full backup and 30 copies of each incremental backup from 25 1.5-TB hosts, this is probably going to be a really large library. To keep things as simple as we can, let's assume the tape library for both solutions is the same size and the same cost. (In reality, the SCSI library would have to be much larger and more expensive.) For the SAN solution, we need to buy two 16-port switches, and one four-to-one SAN router. Each solution requires purchasing device server licenses from your backup software vendor for each of the 25 hosts, so this will cost the same for each solution. (The SAN solution licenses might cost slightly more than the SCSI solution licenses.) As you can see in Table 4-1, there is a $700,000 difference in the cost of these two solutions.
What about the original requirements above, you ask? What if you only need to back up three large hosts? A three-host SAN solution requires the purchase of a much smaller switch and only one router. It requires 12 tape drives for the SCSI solution but only five for the SAN solution. As you can see in Table 4-2, even with these numbers, the SAN solution is over $40,000 cheaper than the SCSI solution.
Parallel SCSI solution
How Does This Work?
The data in the previous section shows that sharing tape drives between servers can be a Good Thing. But how can multiple servers share the same physical tape drive? This is accomplished in one of two ways. Either the vendor uses the SCSI reserve and release commands, or they implement their own queuing system.
Tape drives aren't the first peripherals that needed to be shared between two or more computers. In fact, many pre-SAN high availability systems were based on disk drives that were connected to multiple systems via standard, parallel SCSI. (This is referred to as a multiple-initiator system.) In order to make this possible, the commands reserve and release were added to the SCSI-2 specification. The following is a description of how they were intended to work.
Each system that wants to access a particular device issues the SCSI reserve command. If no other device has previously reserved the device, the system is granted access to the device. If another system has already reserved the device, the application requesting the device is given a resolution conflict message. Once a reservation is granted, it remains valid until one of the following things happen:
- The same initiator (SCSI HBA) requests another reservation of the same device.
- The same initiator issues a SCSI release command for the reserved device.
- A SCSI bus reset, a hard reset, or a power cycle occurs.
If you combine the idea of the SCSI reserve command with a SCSI bus that is connected to more than one initiator (host HBA), you get a configuration where multiple hosts can share the same device by simply reserving it prior to using it. This device can be a disk drive, tape drive, or SCSI-controlled robotic arm of a tape library. The following is an example of how the configuration works.
Each host that needs to put a tape into a drive attempts to reserve the use of the robotic arm with the SCSI reserve command. If it's given a resolution conflict message, it waits a predetermined period of time and tries again until it successfully reserves it. If it's granted a reservation to the robotic arm, it then attempts to reserve a tape drive using the SCSI reserve command. If it's given a resolution conflict message while attempting to reserve the first drive in the library, it continues trying to reserve each drive until it either successfully reserves one or is given a resolution conflict message for each one. If this happens, it issues a SCSI release command for the robotic arm and then waits a predetermined period of time and tries again. It continues to do this until it successfully puts a tape into a reserved drive. Once done, it can then back up to the drive via the SAN.
This description is how the designers intended it to work. However, shared drive systems that use the SCSI reserve/release method run into a number of issues, so many people consider the SCSI reserve/release command set to be fundamentally flawed. For example, one such flaw is that a SCSI bus reset releases the reservation. Since SCSI bus resets happen under a number of conditions, including system reboots, it's highly possible to completely confuse your reservation system with the reboot of only one server connected to the shared device. Another issue with the SCSI reserve/release method is that not all platforms support it. Therefore, a shared drive system that uses the SCSI reserve/release method is limited to the platforms on which these commands are supported.
Third-party queuing system
Most backup software vendors use the third-party queuing method for shared devices. When I think of how this works, I think my two small daughters. Their method of toy sharing is a lot like the SCSI reserve/release method. Whoever gets the toy first gets it as long as she wants it. However, as soon as one of them has one toy, the other one wants it. The one who wants the toy continually asks the other one for it. However, the one that already has the toy issues a resolution conflict message. (It sounds like, "Mine!") The one that wants the toy is usually persistent enough, however, that Mom or Dad have to intervene. Not very elegant, is it?
A third-party queuing system is like having Daddy sit between the children and the toys, where each child has full knowledge of what toys are available, but the only way they can get a toy is to ask Daddy for it. If the toy is in use, Daddy simply says, "You can't have that toy right now. Choose another." Then once the child asks for a toy that's not being used, Daddy hands it over. When the child is done with the toy, Daddy returns it to the toy bin.
There are two main differences between the SCSI reserve/release method and the third-party queuing method:
- Reservation attempts are made to a third-party application
- In the SCSI reserve/release method, the application that wants a drive has no one to ask if the drive is already busy--other than the drive itself. If the drive is busy, it simply gets a resolution conflict message, and there is no queuing of requests. Suppose for example, that Hosts 1, 2, and 3 are sharing a drive. Host 1 requests for and is issued a reservation for the drive and begins using it. Host 2 then attempts to reserve the drive and is given a resolution conflict message. Host 3 then attempts to reserve the drive and is also given a resolution conflict message. Host 2 then waits the predetermined period of time and requests the drive again, but it's still being used. However, as soon as it's given a resolution conflict message, the drive becomes available. Assuming that Host 2 and Host 3's waiting time is the same, Host 3 will ask for the drive next, and is granted a reservation. This happens despite the fact that it's actually Host 2's "turn" to use the drive, since it asked for the drive first.
- However, consider a third-party queuing system. In the example, Host2 would have asked a third-party application if the drive was available. It would have been told to wait and would be placed in a queue for the drive. Instead of having to continually poll the drive for availability, it's simply notified of the drive's availability by the third-party application. The third-party queueing system can also place multiple requests into a queue, keeping track of which host asked for a drive first. The hosts are then given permission to use the drive in the order the requests were received.
- Tape movement is accomplished by the third party
- Another major difference between third-party queuing systems and the SCSI reserve/release method is that, while the hosts do share the tape library, they don't typically share the robotic arm. When a host requests a tape and a tape drive, the third-party application grants the request (if a tape and drive are available) and issues the robotic request to one host dedicated as the robotic control host.
W. Curtis Preston has specialized in designing backup and recovery systems for over eight years, and has designed such systems for many environments, both large and small. The first environment that Curtis was responsible for went from 7 small servers to 250 large servers in just over two years, running Oracle, Informix, and Sybase databases and five versions of Unix. He started managing this environment with homegrown utilities and eventually installed the first of many commercial backup utilities. His passion for backup and recovery began with managing the data growth of this 24x7, mission-critical environment. Having designed backup systems for environments with small budgets, Curtis has developed a number of freely available tools, including ones that perform live backups of Oracle, Informix, and Sybase. He has ported these tools to a number of environments, including Linux, and they are running at companies around the world. Curtis is now the owner of Storage Designs, a consulting company dedicated entirely to selecting, designing, implementing, and auditing storage systems. He is also the webmaster of www.backupcentral.com.
1. This term may be changed in the near future, since iSCSI-based SANs will, of course, use the LAN. But if you create a separate LAN for iSCSI, as many experts are recommending, the backups will not use your production LAN. Therefore, the principle remains the same, and only the implementation changes.
8. Although it's possible that some software products have also implemented a third-party queuing system for the robotic arm as well, I am not aware of any that do this. As long as you have a third-party application controlling access to the tape library and placing tapes into drives that need them, there is no need to share the robot in a SCSI sense.
9. Network Appliance filers appear to act this way, but the WAFL filesystem is quite a bit different. They store a "before" image of every block that is changed every time they sync the data from NVRAM to disk. Each time they perform a sync operation, they leave a pointer to the previous state of the filesystem. A Network Appliance snapshot, then, is simply a reference to that pointer. Please consult your Network Appliance documentation for details.
12. There are vendors that are shipping gigabit network cards that offload the TCP/IP processing from the server. They make LAN-based backups easier, but LAN-free backups are still better because of the design of most backup software packages.