Experts Question ‘Keep Everything’ Philosophy

Enterprise Storage Forum content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Storage managers have been barraged by a wave of claims on the subject of data retention (see Can Data Ever Be Deleted?). The argument is that every single piece of data has to be retained, since otherwise you could get into severe trouble in the future. And, of course, with disk now being so cheap, the thing to do is just retain everything going forward. The cardinal rule, vendors say, is never ever throw anything away.

The logic presented by this camp is that even an innocuous e-mail about the company picnic could have important legal significance in the future that is unable to predict currently — perhaps a sexual harassment lawsuit in two years time that cites an occurrence at that event. So, the argument goes, you can’t dispense with even those little scraps. If you do, they could come back to bite you. And if you don’t keep it, you could be in violation of Sarbanes-Oxley, HIPAA and other laws.

This is certainly a good way to sell people more disk arrays, tape drives and archiving gear. But is it sensible? Or even true? Records and information management (RIM) professionals label it complete nonsense.

“What is represented related to the increased compliance requirements for records retention comes across as scare tactics, and it’s generally being provided by those who would profit the most from organizations ‘keeping everything forever,’ which is a prescription for disaster for any business,” says Larry Medina of Danville, Calif., a RIM veteran of more than 30 years. “While it is correct to state that more information needs to be retained than in the past and for longer periods of time, the retention of data, information and records beyond required time periods is a risk-laden practice.”

Buy to Comply

The storage industry viewpoint has been that the easiest way to comply with current regulations is to buy more storage and hang onto everything until the end of time. While it’s definitely easier, it doesn’t make good business sense, since there’s no need to keep “non-record” material any longer than it has value to your business.

“If you keep everything, you are saying, in effect, that all information is created equal, which it isn’t,” says Julie Gable of Pennsylvania-based Gable Consulting, an independent consultant specializing in electronic records management issues.

The solution is to establish a valid records retention program, part of which is a retention schedule based on the legal, regulatory and statutory needs that also addresses any business need for information beyond that timeframe. That may be more time-intensive today, and certainly more immediate trouble than “just store it all somewhere.” But in the long run, a smart approach to record-keeping could save companies a fortune and possibly even assist in compliance matters. In fact, clear-cut policies on record expiration and deletion could save a ton of trouble down the line.

“In litigation, the rule in legal discovery is that if you have it, you must produce it,” says Gable. “Keeping everything means that, in order to find material that is responsive to the discovery request, you have to sort through everything.”

But there is a steep cost associated with reviewing documents for legal discovery, since this is usually done by paralegals, law students or specialized discovery firms. Having a vast storage repository makes everything harder to find, and often exposes smoking guns that help opponents make their case.

A case in point is the Enron-Andersen debacle. Ultimately, Andersen was found guilty of destroying information, but the information they destroyed should have been discarded (in accordance with their retention schedule) long before. Had it been discarded appropriately in the course of normal business, the records would have already have been destroyed when the order to produce them was filed. What got them in trouble was destroying the records after they were requested by the plaintiff.

Records, Data Not the Same

Part of the confusion seems to be between “records” and “data.” It isn’t true that all data is a record. It is vital to differentiate between the two — apply rigorous policies to one and do what you want with the other.

That’s why it’s such a bad business practice to save all e-mail. Costs mount rapidly, since 70 percent of what you end up storing is spam. The other 30 percent is composed of several percent in multiple copies, and about twenty percent generic corporate traffic that has little relevance to anything. Only the tiniest trickle of the e-mail torrent has any value at all. Yet try finding it amongst everything else if you’ve kept everything over a ten-year period. Further, random office jokes and comments can be twisted by attorneys into solid cases for harassment, racial prejudice and all kinds of other woes.

“A consequence of overenthusiastic retention of data could be lawsuit settlement demands based on the cost to sift through everything to comply with discovery requests rather than on the merits of the case,” says Gable. “Lawyers game the system if they know how much you’ll have to sort through, so a $200,000 settlement demand becomes a $2,000,000 demand.”

So what is the solution? There is no one policy that fits all organizations, since each has its own environment, operational needs and specific legal requirements. Some, of course, are far more regulated than others. A good place to start, though, is to define what a record is to a specific organization. By doing that, you also define, perhaps even more importantly, what is not a record.

“Work with your business process and legal advisers to develop a records management policy, and from that, develop procedures and a retention schedule (after performing an inventory of existing records) and assessing who creates, receives and manages what information for your organization,” says Medina.

Retention, Archiving Also Differ

Another prevalent area of confusion in the storage world is retention versus archiving. Some believe that retention simply involves an archiving strategy based around multiple tiers. Not true.

Retention means keeping information for the required period to satisfy legal, regulatory and statutory needs, coupled with any additional business needs that exceed those periods. Archiving, on the other hand, is the long-term storage of information to meet retention requirements that exceed the active use period of the records, in a manner that allows you to retrieve them while storing them in a less costly manner. Further, the function of archiving can be extended to include the retaining of records that may still have some historic, intrinsic or research value to an organization.

When the fine points between record/data and retention/archiving are understood, storage strategies become less unwieldy: keep only what is required and keep it only as long as required.

“There is no need to purchase additional space or equipment to store unnecessary information and no need to hire staff to manage it,” says Medina. “And you eliminate any potential risks of discovery in the event of a legal action where a plaintiff requests you to produce information.”

And Now for the Hard Part

Of course, all this would be easier if a new company was about to launch in a few months time. It could probably cobble together enough technology elements to make it possible to define which e-mails were records, to formalize their incorporation into a record format automatically, and to address other digital information channels with similar strategies. That could be done, but would require an effort to change the inbred cultural laziness to retain things “just in case.” It’s easy to create, send and store untold amounts of digital data. But it’s hard to do so in an orderly fashion.

Now consider the plight of the average company that has already amassed a few petabytes of archives. Sifting through that mess to isolate the records is quite a task. No doubt ILM solutions will be forced to deal with that conundrum a few years up the line.

“Most organizations are having a difficult time getting their arms around their electronic repositories,” says Medina. “As they made the mistake of allowing things to get out of hand years ago, it’s now much more difficult to close Pandora’s box.”

Technology in this area, then, is seriously lagging. Automated classification systems are being developed, but have had limited success in classifying information without human intervention. Metadata capture to assist with the indexing and classification of information is another critical component, but it too requires the human touch.

“IT directors and administrators need to look for a comprehensive, all-around solution to data management to bring about corporate governance, compliance and efficient storage management,” says Dianne McAdam, senior analyst and partner at Data Mobility Group.

But does such a system even exist? A couple of vendors who appear ahead of the pack in this area are FileNet of Costa Mesa, Calif., and OpenText of Waterloo, Ontario. Interestingly, both evolved from the document/content indexing field, not the world of storage. While storage tends to lump all data together into just so many terabytes, these firms are used to dealing with the nuances of individual documents and how they are classified. But although these companies manifest a deeper understanding of the issue, they are still far from being able to offer the storage world a handy box that takes care of RIM matters automatically.

So with technology facing several hard years of slog ahead of it, perhaps the best approach in the short term is education. Getting storage professionals clued in on the issue, and teaming up with RIM professionals, could result in long-term benefits. The Association for Information and Image Management (AIIM), for example, has released an educational program on Electronic Records Management (ERM). It has been designed from global best practices among AIIM’s 50,000 members. The program examines records management in relation to the business needs of all types of organizations.

“Legal ramifications, time inefficiencies, and unnecessary spending are all consequences of uncontrolled electronic information,” says Beth Mayhew, a senior manager at AIIM.

For more storage features, visit Enterprise Storage Forum Special Reports

Drew Robb
Drew Robb
Drew Robb is a contributing writer for Datamation, Enterprise Storage Forum, eSecurity Planet, Channel Insider, and eWeek. He has been reporting on all areas of IT for more than 25 years. He has a degree from the University of Strathclyde UK (USUK), and lives in the Tampa Bay area of Florida.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends, and analysis.

Latest Articles

15 Software Defined Storage Best Practices

Software Defined Storage (SDS) enables the use of commodity storage hardware. Learn 15 best practices for SDS implementation.

What is Fibre Channel over Ethernet (FCoE)?

Fibre Channel Over Ethernet (FCoE) is the encapsulation and transmission of Fibre Channel (FC) frames over enhanced Ethernet networks, combining the advantages of Ethernet...

9 Types of Computer Memory Defined (With Use Cases)

Computer memory is a term for all of the types of data storage technology that a computer may use. Learn more about the X types of computer memory.