Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
With HSM, your file system can look like it has a petabyte of disk storage while it only might have 10 terabytes of physical disk. The rest of the data are typically archived on tapes, which support compression and do not require power, and if they are high-end tapes, they could even have higher reliability than disk drives. Check out the bit error rates for Fibre Channel disks and enterprise tape at the Seagate and Imation Web sites if you don't believe me.
There are two important issues to consider if you are using HSM software:
- How do you get the data from the main site to the disaster recovery (DR) site?
- How do you handle migration of data to new media or systems?
Moving data using the RAID-to-RAID copy cannot work with HSM, since you do not have a host on the other side to control the file system on the RAID and tapes. HSM software depends on having a host running the HSM file system.
Almost every HSM has a method or methods for replication of data to another site. There are basically three potential candidates:
- Moving the data to another system and to RAID when the tape is written, and then moving the data from that system to its own tapes.
- Moving the data to another system and just writing the tapes (no disk transfer).
- Using a channel extender to write the tapes remotely.
As data is moved to the remote system and then to the RAID, it should look just like the HSM at the local site. The movement of the data is almost always over a TCP/IP network.
With most products, what generally happens is that once a file is available to be archived or has been archived to tape, a copy of the file is written to the remote system via a TCP/IP socket. At that point, the HSM on the remote system takes over.
How the data is moved and the performance issues involved with buffering should be examined. Some products have tuned the performance of this data movement for high-speed networks, while others have not.
Also, if security is important, you might want to consider movement of the data over ssh2 or another encryption method. Does the HSM support data encryption? You could always use encryption in the WAN routers, but you might want to consider both. If you are doing host-based encryption with the HSM software or ssh2, you need to ensure that you have the CPU power to do it, and it should be tested with your type of hardware and network. It would not surprise me if some systems could not run the network at rate while performing encryption because of a lack of CPU power.
Moving to Tape Remotely
Some HSM products move data to the remote system to tape or virtual tape. Next time we will cover in detail the issues surrounding movement of data directly to tape, so you will have to wait until then to fully understand the issues. Needless to say, it is important to ensure that the tape drive run at full rate, including compression.
Far better than direct tape movement are products that support movement to virtual tape. This is not much different than moving the data to another system that is functionally an embedded HSM that manages the tape cache. The concept for these products has been around a long time in the mainframe world, and provides the tapes with a mechanism that can support high-performance data stream, since large files exist in the cache. You should ensure that the virtual tape product uses highly reliable RAID hardware to ensure that your files are protected before they get to tape so that they do not have to be retransmitted. Also, ensure that the software supports this retransmission in case of any type of failure.