The eDiscovery process seems to leap into existence at the start of a lawsuit. But in fact a successful eDiscovery process depends on a strong foundation of well-managed data, which is why information management is actually the first stage in the classic litigation eDiscovery workflow. Yet a number of legal and IT professionals ignore this foundation: Legal because attorneys rarely understand the corporate storage infrastructure, and IT because they do not see how well-managed data benefits the eDiscovery process.
However, eDiscovery for litigation as well as for Governance, Risk and Compliance (GRC) demands fast, accurate and defensible responses to electronic data requests. Yet enterprise data is traditionally located in data silos that force an uncertain and awkward eDiscovery process. This poor level of eDiscovery ends in slow, limited and expensive data collection, which itself impacts the time needed for the intensive early analysis and document review processes. This is not good news for corporations that are already under the gun to respond quickly to eDiscovery requests.
Managing Data for eDiscovery is Tough
Data is not newly created production data but older data. Newly created active data is rarely the immediate subject of eDiscovery actions. The HR or legal departments may be aware of an immediate situation that requires data preservation, but the vast majority of eDiscovery requests will require older data located in archives or backups, or aging on an application server.
The data is unstructured, is produced by hundreds of different applications, and is stored in different geographical locations. Structured data can be included in eDiscovery requests, but database architecture makes collection easier. Unstructured data lacks the query structure and presents more serious obstacles to smooth and timely collection. Unstructured data types include email applications, file systems, SharePoint, enterprise content management (ECM) applications and more – and all of these from a variety of different vendors. On top of this, storage locations are all over the map – literally. Common choices include SAN, NAS, cloud and server-based storage as well as users’ hard drives and portable drives. Yet all of these different locations represent a single universe of data subject to search and recovery for litigation, compliance, and risk and records management.
Archives are better collection sources than backups or application servers, but many corporations do not archive. Backup is not archiving, even though many corporations treat it that way. Backups are for current business continuance and not for keeping years’ worth of copied data. Searching backup catalogs for relevant data, restoring, and reviewing are extremely inefficient – not to mention companies may not have access to older backups anyway. In addition, some companies treat their application server storage as if it were archives, simply letting large volumes of application data age on the production servers. In contrast, archives are architected to be searchable and to keep data safe and available for years if necessary.
Searching archived data can still be difficult. Archiving is a good step forward, but many companies still do not do it. And companies that do archive rarely centralize archives and may only do it for a few key applications such as email. Granted that email is the prime eDiscovery data target and archiving it is a good plan, but file systems and applications such as SharePoint also store relevant information and there should be a way to efficiently collect from these sources also. Otherwise all of these different applications, servers, storage devices, and data types present a serious challenge to locating and recovering information for continuance and business processes.
Data preservation. It is difficult to locate and collect relevant data in the first place, and it can be even harder to defensibly preserve it. Many companies still think that preservation is sending out emails to custodians instructing them to preserve their email. Even when the legal department works with IT to better preserve files, moving data to a secure repository creates multiple copies of data, loads network bandwidth, and may interfere with business applications.
Heavy network and personnel workloads. Collecting and preserving disparate data places severe loads on network bandwidth and storage resources. It also heavily impacts legal and IT staff, who spend inordinate amounts of time collecting and assessing large volumes of discovered files. Even when the corporation uses proactive eDiscovery tools such as indexing, the indexing process itself is time-consuming and can generate continuous network overhead.
The Unified Repository
This highly disparate data environment presents real challenges when attempting to collect relevant data for eDiscovery. Searching all of this data in an ad hoc basis is severely limiting and rarely acceptable in today’s fast-moving business and legal environment.
The key to these challenges is to provide a method where the majority of responsive data is unified under a single physical or virtual repository. Neither method is perfect – physical repositories require companies to invest in a new archive platform, while a virtual repository requires technologies such as indexing, which has its own set of challenges. But either method far outstrips the traditional method of disparate searches across networked servers, old backup tapes, and custodian hard drives.
I’ll write more about the capabilities and differences between physical and virtual repositories in a subsequent article. Be aware that a high performance and defensible unified repository makes searches and collections far more efficient and cost-effective. They are not the only tool you need, as you will also require technologies to search endpoint storage devices and for preserving and processing data. Most eDiscovery repository vendors will provide this type of capability or will integrate with tools that do. But I strongly suggest that you begin the eDiscovery process by proactively unifying the largest pool of responsive data: your unstructured data archives.
Christine Taylor is an Analyst with The Taneja Group, an industry research firm that provides analysis and consulting for the storage industry, storage-related aspects of the server industry, and eDiscovery. Christine has researched and written extensively on the role of technology in eDiscovery, compliance and governance, and information management.
Follow Enterprise Storage Forum on Twitter.