There is a lot of confusion over what constitutes the cloud. “Clouds” can be an umbrella term for pretty much anything at all that involves either storage pooling, delivery of some service over the Internet, or a nifty idea that came out of your marketing team’s Monday morning meeting. “Heh! Let’s call the product a cloud! Yeah, that’s the ticket!”
So let’s define some cloud terminology narrowed down by eDiscovery. Essentially eDiscovery applications and services are cloud-based when they are delivered online by third-party providers. Titles are fluid and indistinct, but a couple of the major examples include Software as a Service (SaaS) or cloud-based application delivery, and hosted eDiscovery or cloud-based eDiscovery. (Believe me, different vendors will define these terms according to what they think will sell the best.) Below are some decent working definitions.
- Cloud-based eDiscovery Application: The eDiscovery software vendor hosts their application on their own networks and delivers it to customers via the Internet. Customers use the application for various eDiscovery tasks such as analysis or review. This can be quite useful when the corporation does not want to make major investments in on-premise eDiscovery technology, or when the corporation wants its law firms to use the same eDiscovery software. Impact on IT: Must provide sufficient bandwidth for smooth application delivery, but will not have to install or maintain hosted application. This means that IT is responsible for how fast (or not) the application is running over the pipes, but can cheerfully refer end-users to the application provider for all other support.
- Cloud-Based Hosted eDiscovery: The customer collects relevant data in response to an eDiscovery matter, processes it, and sends it via the Internet to their hosting provider. The provider stores customer data on their site or in a co-location facility, and runs various levels of eDiscovery on the data. The vendor might provide services including software, expert consultants and project management. Impact on IT: Must provide sufficient bandwidth for copying data to the host as some data collections can be quite large. Should be concerned over security. Must work closely with the hosting vendor to verify that data in-transit and at-rest is secure.
The second process is the one that impacts IT the most, and is the one where they need to be intimately involved. Legal will not know to ask these questions for the most part and should turn to IT for help. IT needs answers to the following questions from the hosting provider:
- “What kind of bandwidth do I need to provide for all this data?” This issue is not as alarming as it sounds, since in practice very large data collections can be sent to the host via removable media. However, during rolling productions it is very inefficient to burn data to disk and send it physically. Attorneys will want and need to send incremental processed data sets over the wires to their hosting provider. This method follows a spiking pattern as data is progressively collected, processed and sent.
- “How secure is our data in-transit?” Encrypting the data while sending is actually your responsibility, but the provider must be capable of decrypting at their site. Note that you will want to encrypt the processed data whether you are sending it on physical media or over the Internet. The very last thing you want to have happen is to lose a collection of preserved data from off the truck, or to expose digital data to hackers. Judges do not like that AT all.
- “How secure is our data at your site?” The hosting provider should do the following to your satisfaction:
- Data protection. This includes backup at least. Preserved data for litigation is not active data so does not require replication or snapshots, but backup is a critical protection function. Granted that you have the originals of the data at your own site, but they will be the ones who are hosting specific collected data sets. You do not want to have to re-do collection and processing (and this is disastrous for chain of custody), so the provider should backup the data they are hosting.
- Chain of custody. Your provider must protect data against modification in order to preserve chain of custody. Incidentally, be sure that you are not inadvertently modifying it yourself when you collect it for the host – some copying functions change metadata to reflect the current date. Not a good thing in an eDiscovery matter, to say the least. Test first.
- Security for at-rest data. Stored data will probably not be encrypted, as the provider will be actively running eDiscovery processes on the data sets. You are looking for a secure multi-tenant infrastructure since the provider will certainly be hosting multiple clients. Look for private cloud structures that offer segmented storage pools by tenants. You do not want to transport your data to a broad public cloud that does not offer secure segmentation. Also, make sure that the physical plant is secure. If your hosting provider is storing your data in a large and poorly managed co-hosted facility then you need to know that. Look for a facility that can prove tight physical and digital security to your satisfaction.
Legal has not always communicated with IT, but that is slowly changing. One of the reasons behind the shift is that, on the whole, attorneys do not have a good working knowledge of the concerns and challenges behind storing corporate data in the cloud. IT should be involved at this point to vet prospective hosting vendors, and to assure themselves – and Legal – that they ay entrust data to eDiscovery hosting providers.
About the Author
Christine Taylor is an Analyst with the Taneja Group, an industry research firm that provides analysis and consulting for the storage industry, storage-related aspects of the server industry, and eDiscovery. Christine has researched and written extensively on the role of technology in eDiscovery, compliance and governance, and information management.
Follow Enterprise Storage Forum on Twitter