Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Today, what reassures us in our private lives can challenge us in our enterprise lives.
In our private lives, we're relieved to know that new government regulations mandate security for information about us. For example, think about HIPAA (Health Insurance Portability and Accountability Act) and how it requires the medical clinic where your doctor practices to keep a tight rein on who sees your medical records, even as the clinic communicates with your nationwide pharmacy and your health insurance company.
As contributors to the enterprise, however, we might be in charge of meeting those same government regulations. For some of us, it's not enough to run a series of snapshot backups anymore. We need to keep an exact copy of specific data, ensure only those authorized have access to it, provide fast access where required, prevent proliferation of extra copies, and destroy all copies on a mandated schedule.
As another real-world example, consider check imaging for your bank, where the data about your checks must be accurate, unchangeable, and secure for seven years, then guaranteed destroyed. In other words, the data must be authentic for the duration of its specific life expectancy and then be gone.https://o1.qnsr.com/log/p.gif?;n=203;c=204650394;s=9477;x=7936;f=201801171506010;u=j;z=TIMESTAMP;a=20392931;e=i
Information lifecycle management (ILM) was born from this context.
So Much Data, So Little Time
We're generating, using, and storing astonishing amounts of data, amounts that are growing at exponential rates. The challenge to enterprise data storage managers is to provide fast access for authorized users to the ever-growing data, to protect the data from unauthorized access, to schedule data destruction on a specified schedule, and to do all this without spending the enterprise's entire budget.
This challenge is especially daunting for managers of data that remains static once it has been generated, but that is also required to be stored, with a clean audit trail, for a specific amount of time, and then guaranteed destroyed. Typical applications that generate large volumes of this type of data are document imaging, medical imaging, check imaging, and email archiving.
When data must be analyzed and studied, it is often critical that its original form be preserved for future comparison and reference. At other times, the data exists as vast original image files, or large files that are accessed frequently in the first month after generation, but then rarely accessed again.
And then there's some data that must not be stored. Sarbanes-Oxley tells information managers it is not enough to store all data generated; instead, some data must not be kept. The IT manager must sort, store, and retire files according to their contents.
Conventional backup approaches capture data in a set process. For data that does not and should not change, the system administrator could be performing a full backup of the same data every day for the retention period.
For example, think again of your check images, copied daily for seven years. With tapes being recycled as part of a regular backup process, your bank may lose the audit trail or the original might be lost. Neither case is a good scenario for check images.
And conventional backup doesn't limit or control copies. For example, if the IT staff doesn't get the tape back from the warehouse, they use a new one and wind up with another copy of the data. At the end of the retention period, such as seven years for check images, it becomes impossible to guarantee all the copies have been rounded up and destroyed.
The enterprise's very real and rigorous budget also mixes into this challenge. If there were no money constraints, it would be fairly easy to keep everything online on disk, providing near immediate access to the data. And daily snapshots would guarantee against catastrophic loss.
If all this sounds squirmy to you, reality figures large in your life.
Is ILM a Promise or a Reality?
The buzz is on for information lifecycle management to put an automated lid on this can of worms. Since this technology is still new enough to prompt questions, we sought out Dr. Phil Storey, CEO of XenData, a company that develops software for managing fixed content data, to learn if ILM is as good as its promise.
Today, on a single Windows server, archival software can be combined with RAID and high-performance tape libraries to serve as a standard rewriteable file system. With this combination, the business need for access to data can be aligned with the business need to put data on the appropriate media with a known scheduled destruction date.
The single server presents the different storage types as a single drive letter, so archival software runs with any application. System administrators now have full control over the storage hardware and a solution to the complex issue of providing access to disparate applications.
As an example, XenData's Archive Series Software provides virtualization and legal compliance by supporting SAIT and AIT write-once, read-many (WORM) tape libraries and RAID on a single server.
"These two elements work together to provide ILM in our product that is real and shipping today to our customers," says Storey.
Legal Compliance Made Easier
Enterprise storage managers have talked for years about open system policy-based data management. Today, software is becoming available to allow system administrators to set policies in open systems so that the systems run automatically. Think of it as a three-phased process:
- Phase 1. All data written to a subdirectory is stored simultaneously on RAID and tape. By writing to WORM tape cartridges, data authenticity is ensured. By allocating the subdirectory to a set of tapes due to be destroyed according to the policy schedule, it's assured that data will not be kept longer than mandated, and it becomes easier to control tape inventory.
- Phase 2. After the time specified by the policy since the last data access, the system removes the data from RAID. If the data needs to be accessed, it is still available on tape.
- Phase 3. The tape containing the data is destroyed according to the policy schedule.
According to Storey, "In a system with multiple applications, each application can set up its own folders and its own rules."
ILM and Backup
Storey recommends to his customers using the Archive Series solution that they stop performing snapshot backups. Instead, companies implementing open system ILM should replace snapshot backups with a data replication scheme. The system administrator specifies the number of tape replicas which are then updated automatically, allowing rotation of tape cartridges to a secure off-site location.
"If you also back up your data to tape, you can get confused and you destroy the audit trail," says Storey. "You wind up with an uncontrolled number of tape copies, which ruins any hope of maintaining data destruction policies."
Mindful that any new technology presents risks to early adopters, we asked Storey about any challenges posed by open systems ILM. According to Storey, the only limit today is the computing environment. At present, only Windows-based solutions are available, but "if this one restriction fits into your environment, then an ILM solution is easy to implement," says Storey.
Storage managers can implement the solution in a heterogeneous environment by running Microsoft Services for UNIX on a Windows-based ILM server. This allows UNIX clients to archive and retrieve files via NFS at the same time as Windows clients write and read files using CIFS.
Who's Using ILM Today?
According to Storey, whose Archive Series software has been shipping since mid-2003, customers range from a mid-sized community bank to a recent installation where the largest license is for a 100+ TB system.
Typically, customers install solutions employing WORM tape and RAID to help meet government regulations. "Combining ILM software and storage libraries running WORM tape systems like Sony's SAIT WORM offers a wonderful solution," proclaims Storey.
But Storey sees ILM solution software also benefiting enterprises without external legal requirements. The HSM (Hierarchical Storage Management) element helps customers manage large amounts of data that have predictable lifecycles, making the total solution relatively affordable.
Fixed content archival applications are especially suited for ILM solutions. Examples include document imaging, medical imaging, large scale ERM/COLD, financial item processing, scientific data, and email archives.
"Without ILM, it would cost a fortune to keep all of this data online and available for fast access," states Storey.
Who Will Use ILM Tomorrow?
It's no surprise that XenData's analysis shows a growth market for high capacity, unalterable storage to manage fixed content or reference data. But the rate of growth is staggering. "We see a 90 percent per year growth rate," says Storey. Legal compliance requirements are pushing this growth.
It's also no surprise that with ILM, the more data that needs to be managed, the more savings can be realized. According to Storey, "Today you can install a 100+ TB system for less than $200,000 using ILM."
Compare this with the cost of an online system at millions of dollars. "At the lower end, you can install a solution with a couple of TB of data on a Windows system for under $25,000," say Storey. This is an appealing number for mid-sized enterprises with lots of data.
As with any technology that develops in response to a real user need, ILM makes sense. New software and hardware combine to bring solutions to a new marketplace, so that open system enterprises can now take advantage of the ILM functionality previously available only in the mainframe world. Companies tiptoeing toward the leading edge of solutions will find innovative offerings designed with compliance, management, and ease of use in mind.