With the growth of fixed content far surpassing all other kinds of data, a big challenge for storage managers is what to with all those e-mails, documents and media files that don’t change after their creation (see Getting a Fix on Fixed Content).
Add to that the protection and retention requirements of regulations such as Sarbanes-Oxley and HIPAA — and the need to retrieve files quickly when demanded — and storage admins have quite a problem on their hands.
Here are a few options for getting a handle on all that fixed data.
Analysts at Gartner note that because the underlying file systems attached to the protocols are HSM-aware and because CIFS and NFS are standard offerings of all major operating systems, NAS eliminates the complexities that normally exist between applications and various OS platforms.
Bill Biles, co-founder and vice president at Data Domain, says that NAS has quite a few strengths, including the fact that data movement between storage tiers is integrated into the array controller, which simplifies policy-based storage movement between tiers. However, Biles says NAS has some weaknesses, including the lack of de-duplication for backup software.
Michael Frendo, senior vice president at McData, agrees that NAS file systems provide a simple organization, access and retrieval tool.
The weakness, says Frendo, is the requirement for heterogeneous operating systems to support a single NAS architecture. NFS and CIFS integration in an HSM are difficult, he says. And effective support for file system scalability requires independent operating system vendors to provide that support in the OS kernel. Also, he notes, most NAS systems require customers to be tied to a specific vendor’s file systems, such as NetApp’s WAFL.
“Moving from one NAS vendor to another requires an old-fashioned dump and restore to move data between different file systems, which is extremely painful and prohibitive,” says Frendo.
Another popular choice for an archiving target is external control-based arrays (ECB). According to industry experts, ECB storage systems are probably the most common disk-based archive targets today. One of the reasons for this is because they use SATA drives, and their RAID controllers typically don’t have sophisticated data management features.
Biles says the main strength of ECB storage is that it works well with all software. However, its biggest weakness, he says, is that it has no specific features, and it leaves all of the work up to the software.
“Because the data movement between storage tiers is integrated into the array controller, it simplifies policy-based storage movement between tiers,” says Frendo. “However, ECB does have its weaknesses, including vendor lock-in and data migration to a competitor array for asset replacement.”
Content-addressed storage (CAS) systems differ from NAS and ECB storage systems in that they generate a unique tag that is used to store and retrieve objects. Using tags to store and retrieve objects eliminates traditional file system-based addressing schemes and frees the CAS system to efficiently relocate objects to optimize scalability.
However, Biles says CAS systems are too slow for backup software, and that for many non-regulated applications, the features may not be worth the price of that much storage.
Some experts say that within a storage hierarchy, CAS fits most naturally into the middle tier. CAS can also save on the total amount and cost of storage utilized by businesses, they say. Because each document only needs to be saved a single time and is identifiable, it eliminates the problems associated with storing multiple versions of the same data.
Gartner analysts say that for users who do not want to use tape for archiving, CAS appliances are the most self-managing, self-healing disk systems currently available for archiving.
Frendo and others say that all three (NAS, ECB and CAS) are segments of an overall tiered storage strategy. “NAS or ECB could contain different price/performance/feature storage elements, but rarely do today,” says Frendo.
He adds: “If we build an information lifecycle policy that actually determines the best price/performance/function storage device for each point in the lifecycle, then we need a policy manager to enforce the requirements.”
Frendo says all three technologies have controller logic in front of physical disk drives that could potentially enforce policy. It would then be a balance of appropriateness of the production copy (NAS or SAN storage) combined with the ability to support lifecycle policies.
“Commonly, the policy manager will run on top of the storage in server or network devices to move data between tiers, at which point each specific storage product can be measured against its price/performance capabilities at a particular stage in the lifecycle, with less dependency on the specifics of the upstream or downstream tier,” says Frendo.
Frendo says the biggest advantage of CAS systems is the ability to enforce business policy on how data is migrated and stored in a networked environment.
Finding What Works for You
To find out what is the best fixed content disk archive solution for you, figure out what your needs are and then talk to your storage vendor.
For general archiving needs and for users who intend to keep tape in the big picture to help drive down costs, ECB is the best choice.
NAS may be the best choice for general archiving activities and for when users require the openness of NSF and CIFS to support their environments.
Finally, some say CAS is the best bet for users with the most stringent regulatory requirements and who don’t want to deal with tape.
For more storage features, visit Enterprise Storage Forum Special Reports