Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
It may not be fashionable, but when it comes to archiving or backing up large amounts of data at low cost then it's hard to beat the economics of magnetic tape. The storage medium itself is cheap, offers high capacity, and when not in use it consumes little or no power – so even if disks were free, tape might still be cheaper to own.
But there are drawbacks, of course. Accessing data stored on tape can be slow, and traditional tape archiving and backup systems tend to be highly proprietary: the way the data is stored, and the database containing the tape indexes that allow you to access data is unique to each vendor. That means tapes made with one system can't be used in any other.
It was to address this second issue that Linear Tape File System (LTFS) data format was developed by IBM, making its debut around four years ago. What's special about LTFS is that it is independent of any separate database, so tapes that use the data format can be used in any system that works with the format, from any vendor.
But perhaps more importantly, storage systems running the appropriate LTFS software can also present a standard file system view of the files stored on the tape in LTFS format, just as if they were stored on traditional spinning disks or flash media. That's important because it means users can drag and drop files or simply click on them to access them in the normal way.
Proprietary LTFS software has been developed by IBM for use in its tape drives, and there are also several open source solutions developed by the likes of Oracle, Quantum and HP. LTFS software is also commonly used with cheap Linear Tape-Open (LTO ) tape cartridges.
The current specification, LTO-6, enables cartridges with an uncompressed capacity of 2.5TB and a maximum data transfer rate of 160MB/S. The next specification, LTO-7, will offer capacities of 6.4TB when it is released in the coming months. Drives designed for a particular generation of tape can always read and write cartridges from the previous generation, so a sensible degree of longevity has been built in to the specifications.
Large file storage
LTFS systems have been particularly popular among video production companies for storing huge media files, according to Jeremy Brovage, a backup solutions architect at Kentucky-based IT solutions provider SIS. "If you have large amounts of video data then the LTFS is the sweet spot because it is inexpensive and massively scalable," he says.
A system such as IBM's TS4500 tape library can store up to 5.5 petabytes of data in a single frame library, and supports LTO-6 or LTO-5, or IBM's own TS1150 and TS1140 tape drives, says Brovage. "That is a tremendous amount of data storage capacity," he says. "It's also a system that uses very little power – especially when you compare it to a disk system, because the reality is that people don't power down disks."
It's certainly true the cloud-based archiving services offer tremendous – and scalable – storage capacity at a low price, with effectively no power requirements (in your data center). But Brovage points out that the cloud is not a practical solution for archiving large media files. "When you need the data back, the time involved in transferring these files makes it usually impractical," he points out.
High performance tape
We've established that LTFS libraries are cheap and scalable. But by combining some disk (or flash) storage – used as a fast access cache – with LTFS tape systems presenting a standard file system, it's possible to create a hybrid storage system that is fast and offers vast unstructured data storage capacity and very low cost.
One way to use a hybrid LTFS tape/ disk system is to buy an appliance such as Crossroads' Strongbox hybrid system, designed to offer protected primary or archive storage capacity. It looks to other systems like a standard NAS appliance, and contains both disk and LTFS tape storage, with policies that can be applied to govern how data is treated.
To illustrate how a system like this can be used for primary storage, let's take the example of large video files – the type of files that LTFS was originally commonly used to handle. A collection of large video files, which each may be 100GB or more, would be a good candidate for tape storage if it were to be archived, but not if the files are being worked on and need to be accessed frequently.
"For this type of video editing purposes, we might set the controls so that 4GB of the file is stored on a disk cache and the other 96Gb is stored on tape," explains David Cerf, Crossroads' strategy EVP. "When a user clicks on the file, it starts to come off the disk, and then the read is completed from tape. We control the speed of the readback so it doesn't time out the application, but it is not too fast for the tape."
The maximum capacity of a Strongbox is 39Pb or 1.6 billion files (using up to 16 LTO-5, LTO-6 or IBM TS1140 drives) and up to 21 TB of RAID 6 protected internal disk cache. But the amount of cache can be increased by linking the appliance to a SAN's block storage using Fibre Channel or SAS.
That means that a hybrid system like this can also be used to provide high capacity protected primary disk storage, with copies automatically made to tape. For example, FreemantleMedia, an international entertainment production company, uses a Strongbox hybrid system to store about 5,000 hours of television footage for the current series of America's Got Talent - a total of about 96Tb of data.
All the data is stored on disks for fast access, but automatically backed up to LTFS tape to provide a backup copy that can be accessed easily if needed.
"I feel incredibly nimble with the footage on disk, but I also feel safe having a tape backup," says Zach Jarosz, a VP of post-production at FreemantleMedia. "It's a perfect solution for us as we save money by not having to store footage on an editing server, and the footage is automatically backed up so it is safe."
At the end of the season the disks will be flushed and the footage archived on LTO-6 tapes which will be stored in a secure environment.
A hybrid system can also be used to provide automatic tiering, using policies to determine where data is stored. For example, in a medical environment a new X-ray image can be stored to disk and accessed by physicians very quickly. After a set amount of time dictated by policy – perhaps 30 days – the X-ray image is unlikely to be required again and can be archived within the system.
This simply involves moving it from disk to tape – either within the same appliance, or on another linked appliance (which can be on the same premises or at an external location,) or on a tape that is removed physically and taken to a remote storage facility. Since it is stored in LTFS format it can be read back by any other LTFS system.
Costs: tape v disks
Cerf maintains that a hybrid storage appliance costs about 50% of the price of a disk-based system in terms of capital expenditure, and the more tape storage that is used the more this capex advantage increases. A system with 1.1PB of native capacity costs about $125,000.
The TCO is about half of a disk-based system when factors like space as well as energy consumption are taken in to account – about $180,000 over five years for the tape appliance compared to about $380,000 for a low end disk system - according to figures produced for Crossroads by Brad Johns Consulting.
This kind of opex (which amounts to about one third of a cent per GB per month) compares favorably to ultra-low cost cloud storage services such as Amazon's Glacier, which charges 1c per GB per month (plus data transfer charges.)
Amazon doesn’t reveal the infrastructure that Glacier runs on so there's no way to be sure, but it could well be that its service is built on some sort of hybrid LTFS tape/disk system, according to SIS's Jeremy Brovage. "I think that a lot of cloud based archiving systems may actually be using hybrid systems in the background," he says.
Photo courtesy of Shutterstock.