After years of minimal involvement in ediscovery, IT is increasingly responsible for the critical data management and collections phases of ediscovery. This article follows up on our previous discussion of ediscovery, “Sea Changes for IT and Ediscovery” Here, we will look at the three driving factors in this change, as well as the place email archives and backup now occupy in meeting new ediscovery responsibilities.
Three Driving Factors for Fast-Growing Data
Fast data growth is a fact of life in private and public sectors. Storage administrators work to manage storage across complex infrastructure, including primary storage, nearline/archival storage, disk-based protection tiers, tape libraries, off-site mirrored storage systems, and individual desktops and laptops.
This is challenging enough simply for managing backup and archive, and ediscovery adds a whole different twist. The bad news as far as ediscovery goes is that 1) these fast-growing data stores house data must be discovered for litigation and compliance, 2) IT must control this data for retention as well as for cost-effective storage management, and 3) each storage type may have dozens or hundreds of storage targets and components, making each sub-environment a discovery challenge all by itself.
IT has been understandably reluctant to take on the ediscovery process in these complex storage environments, but there is no avoiding it anymore. The risk of poor ediscovery is too high. Attorneys understand collection and preservation in a legal sense, but only IT is positioned to provide search access and preservation in complex storage environments.
1. Faster and Wider Collections
Companies used to plead “undue burden” when a collection promised to be time-consuming and expensive. Judges often granted the request if the expense of a search was disproportionate to the matter. The principle of proportionality remains important, but judges are looking at the reasons behind undue burden motions. If the reason for the motion is poorly managed data storage, then the judge is likely to deny it and simply direct the company to bear the cost.
The ball is in IT’s court in this one. IT should build a well-managed storage infrastructure that will optimize any ediscovery collections software. Ideally IT will be able to proactively simplify complex storage since well-managed storage also has major cost benefits beyond ediscovery. This is a major undertaking in the long term, but IT can start by deploying an email archive product, which goes a long way toward solving many collection problems.
2. Double-Duty Ediscovery Products
An encouraging trend is that many ediscovery collection products can do double duty. These products leverage their technology for IT by enabling both ediscovery and data management operations. Examples include email archiving with ediscovery options from a storage vendor or a policy-driven data retention engine from an ediscovery vendor. Both approaches encourage IT to use the same product to support ediscovery and to improve storage management. This is an especially strong argument when the money is coming out of IT’s budget. Even when the money is carved from several budgets, a double-duty product still increases the value of IT’s time investment.
3. When the Rubber Hits the Road
Now that we have identified IT’s ediscovery drivers, where will IT see the largest impact? The single largest ediscovery target by far is email. Unfortunately, for many companies, the extent of their email “archives” are actually backup tape, which makes email collections very difficult and time-consuming to do. Let’s look at why, and what IT’s alternatives are.
Email may be stored online, in email archives or on backup tape. Active email on primary storage can be an ediscovery target, especially if attorneys are immediately aware that there might be a lawsuit, such as for an on-the-job injury. In this case, Legal should issue an immediate data preservation order for messages related to the accident. But most lawsuits occur from one to five years — and sometimes more — following an incident, which means backed up or archived email will be the ediscovery target. Frequently, email is stored as long-term tape backup, which is an awkward repository to say the least. Here is an all-too-common procedure:
- Step 1: IT receives the preservation request from Legal. Legal has emailed data custodians to preserve messages on desktops and laptops, and IT is responsible for locating email on company backup.
- Step 2: Locate backup catalogs from multiple backup applications.
- Step 3: Search each backup catalog for potentially responsive tapes.
- Step 4: Issue request to off-site vault manager for specific tapes, or go search for them yourself.
- Step 5: Receive and inventory tapes.
- Step 6: Use multiple backup applications to restore data. Provision more storage halfway through. Tear hair out.
- Step 7: Search metadata within date ranges and by custodian. Copy responsive data onto online repository or removable media.
- Step 8: Document the entire procedure for chain-of-custody. Swear you will never do this again as long as you live. Know that you will not be so lucky.
Clearly, there is a better way. Some people will simply suggest large-scale email deletion off of primary and secondary storage, but this requires Legal be completely up-to-date on any preservation needs. Remember, the lawsuit need not have formally begun; Legal just needs to have a reasonable expectation of one. For example, if a female employee quit after reporting sexual harassment to HR, Legal should immediately issue a preservation order for relevant emails. The fact that IT has a 90-day immediate deletion policy will not impress the court, to say the least.
The best way to keep email ready for possible ediscovery is to use an email archive product. These are immediately searchable and subject to automated policies for movement and compliant data deletion. Below are two examples of doing this well –- and not.
Scenario No. 1: Poor Ediscovery Result With Backup Tapes
This company hasn’t been burned by ediscovery yet but is about to be. Legal was surprised by a lawsuit it should have seen coming, and it is scrambling to issue preservation orders and collect email for analysis before meeting with the defendant’s lawyers. The data is stored on backup tape located in an off-site vault. But IT is having trouble. They check backup catalogs only to find that the backup application used at the time is no longer in use at the company. They try to locate an older version of the backup application so they can restore the data for review, only to find that the original backup was done in GroupWise and the company is now on Exchange. Attorneys grumble and budget for an outside collections consultant to handle the tape, but IT now suspects the tapes in question were overwritten anyway. Sanctions and an adverse judgment ensue.
Scenario No. 2: Email Collections in a Well-Managed Storage Infrastructure
Two years ago this company was burned by legal sanctions when IT could not locate a 3-year-old tape backup set containing responsive email data. After the nightmare, IT purchased an email archiving system. Exchange administrators now keep active messages on the Exchange server for 90 days. The messages are backed up to a VTL, and after 90 days they are copied to physical tape for off-site storage. When a message is backed up, an archiving copy is sent to an Exchange archive, which checks for duplicate data. When IT receives a collections request from the attorneys, they easily direct the collections software to search the email archives. Setting up attorneys on the system is simple because the archive is a single well-managed storage target with centralized policies for intelligent archive management. As more data gets stored on the archive, the more valuable the system grows for ediscovery, compliance, governance and data retention management.
Christine Taylor is an analyst with The Taneja Group, an industry research firm that provides analysis and consulting for the storage industry, storage-related aspects of the server industry, and ediscovery. Christine has researched and written extensively on the role of technology in ediscovery, compliance and governance, and information management.
Follow Enterprise Storage Forum on Twitter.