Organizations around the globe not only generate a lot of data, but they store and archive much of it. Where organizations were once managing gigabytes of data and storage, they are now managing terabytes and even petabytes. One of the biggest issues facing storage managers is what to do with fixed content data — all those e-mails, documents and media files that don’t change after creation.
The growth of fixed content information has far surpassed other forms of enterprise data. The Yankee Group says the market for fixed content data is expected to grow to 1,251,900 terabytes by the end of 2006.
The biggest issue driving the growth of fixed content is compliance with data protection and retention regulations such as Sarbanes-Oxley and HIPAA, making archiving data for short- or long-term retention no longer a simple decision between disk or tape, but a complex decision involving a comprehensive understanding of different architectural approaches and vendor solutions.
Brian Biles, co-founder and vice president of Data Domain, says the biggest challenges are to reduce costs through data reduction and to replicate to a disaster recovery site efficiently. For many applications, he says, backup is the foundation of archiving. “If you can backup to disk and leave it there, most archiving policies are in better shape,” he says.
ILM Complicates Matters
Information lifecycle management (ILM) has added complexity to the issue, as vendors try to offer all the “tiers” of storage needed to support ILM programs, says Michael Frendo, senior vice president of engineering at McData. Such programs are also driven by compliances issues, as companies try to store data according to its value while meeting retention and protection mandates.
Frendo says that because vendor products used in ILM architectures are often derived from a variety of product lines that don’t necessarily interoperate, vendors are forced to develop proprietary solutions that accentuate and support their own product offerings but don’t necessarily support the offerings of other vendors. As a result, he says, customers must carefully define their data archival and retention objectives in order to determine which technologies can best help them address those requirements.
Another reason for this complexity, says Frendo, is that there are two fundamental choices to be made for ILM implementation, and these complicate management of the underlying archive. One choice for the archiving container is to use existing data management objects, such as file systems, RDBMS or e-mail stores, which are extended to support archival semantics. The other choice, he says, is an independent application used to create and manage content, including archiving.
Frendo says that each approach has its strengths and drawbacks, but he added that complexity increases as customers deploy multiple ILM containers (such as HSM for file systems, document management systems for unstructured content and e-mail repositories for communications) and management applications without a unifying architecture that supports ILM principles.
Low-Cost Disk Grows in Popularity
While it may make sense to store rarely accessed data on the lowest cost storage possible, when the data is suddenly needed, its value increases immediately, and it must be accessible on demand. In fact, penalties may be involved if the data is not readily available. Some say this is the reason that a lot of companies store rarely accessed data on high-performance systems, increasing operational costs and wasting valuable processing resources.
As a result, fixed content archiving built on top of tiered disk storage infrastructures is quickly becoming a best practice in highly regulated industries. Certain factors are driving this trend, according to Gartner, including budget constraints, staff productivity and regulatory requirements.
Some experts say that for users who do not want to use tape for archiving, content addressable storage (CAS) appliances are the most self-managing, self-healing disk systems available for archiving today.
“Tape sucks,” says Biles. “Gartner says that one in 10 recoveries from tape fails. No other technology in IT has anything close to a 10 percent failure rate. If the penalty for losing data is high, it’s worth the premium to archive to disk today.”
Since storage volumes double every 18 months and regulatory compliance no longer allows deleting many files, IT needs a solution other than storing everything forever on high-performance, high-cost disk storage, says Frendo. “Fixed content storage ensures tamper-proof, permanent storage at very low costs,” he says. “Fixed content archiving introduces a new, lower-cost storage tier that is highly secure at the bottom of the storage hierarchy.”
That said, Frendo notes that data movement between tiers can become a cost issue, particularly if one vendor’s storage is in one tier and another vendor’s storage is in another. “The cost of data migration between tiers and cost and time of data migration for asset replacement offset the cost savings of the storage itself,” he says. “Effective use of tiered storage also requires effective, open data movement services.”
Frendo believes that to further reduce the cost of regulatory compliance, remote office data consolidation is becoming an essential tool for consolidating the 60 to 75 percent of data that resides outside of the data center. “Remote office consolidation ensures that all required data is archived in a tamper-proof repository managed within the data center,” he says.
Eliminating duplicate data is another important consideration, says Biles.
“The bigger issue is reducing data size, especially across version histories,” says Biles. At worst, he explains, there might be a price premium factor of two or three for midrange Fibre Channel disk arrays compared to ATA from the same vendor. On the other hand, backup and archive data can be 10 times bigger than primary data. “That’s a more important factor to address,” he says.
For more storage features, visit Enterprise Storage Forum Special Reports