With the amount of data growing by leaps and bounds every year, technologies that can make the most of storage capacity and reduce the amount of physical space, power and maintenance that storing terabytes or petabytes of data requires are becoming increasingly attractive to large and even mid-sized organizations.
Not surprisingly, analysts predict continued strong growth and adoption of secondary storage optimization and data de-duplicationsolutions over the next 12 months. In times like these, it’s technology that sells itself.
Now, building on the success of secondary storage optimization (SSO), a number of vendors have introduced optimization solutions aimed directly at primary storage. Although it’s early days for primary storage optimization (PSO), with only a few hundred confirmed customers, analyst Eric Burgener, who covers storage optimization for the Taneja Group, sees PSO and technologies like wide area data services (WADS) gaining momentum and making inroads with storage customers over the next year.
“Over the last four years, we’ve seen the growth rate for secondary storage optimization [take off],” said Burgener.
But when SSO first came on the scene, “people were very hesitant about the technology, because it was new and there hadn’t been anything like this around before,” he recalled. “Primary storage optimization is very similar to secondary storage optimization in terms of the concept. But this time around the concept’s proven. People know that the technology works pretty much — and there are thousands of referenceable customers that are using secondary storage optimization technology,” whom end users can talk to.
“So we think the growth of the market is going to happen much more rapidly in the PSO space, just because people are already generally familiar with that technology,” he said. PSO and SSO use different algorithms, “but the concept is very similar.”
Inline vs. Post-Processing
In defining the PSO market, Burgener identifies two distinct camps: the inline approaches (exemplified by Storwize) and the post-processing approaches (exemplified by NetApp and Ocarina Networks).
Which approach is right for optimally storing your primary data depends on the problem you are trying to solve, said Burgener. “There’s no one [PSO] technology that’s the best for all kinds of situations,” he said. “The different approaches characterize what happens to writes to storage. All PSO solutions handle reads of capacity-optimized data at wire speeds.”
By way of examples, Burgener cites Ocarina Networks as “the most application-specific player on the primary side,” with, as of late September, 112 different algorithms — or 112 different file types that its Ocarina ECO System could identify, including TIFF, MPEG, Word, and PPT files.
“They’ve actually got an algorithm that’s specific to each one of those,” he said. “So if you’re dealing with, say, pictures, or an online photo database, Ocarina [with its post-processing approach] is a pretty good fit for that — and why Kodak chose them, because these algorithms give them higher data reduction ratios than you could get out of a more generic technology like, for instance, what Storwize is using against that particular data set.”
On the other hand, if your goal is to increase your storage capacity at every point in the data’s lifecycle, you’re probably better off using an inline approach, like Storwize’s STN appliances use, said Burgener, because that data is constantly being optimized.
The difference, he said, is that “the Ocarina approach is going to end up using more storage capacity in the earlier days or weeks of a particular piece of data’s lifecycle, but then it ends up reducing it as it gets older, whereas Storwize’s approach is much more ‘let’s reduce it right away.'”
In addition to Storwize, Ocarina Networks and NetApp (NASDAQ: NTAP), which Burgener said has “basically packaged a de-duplication capability with their Data ONTAP operating system, so that every NetApp box that goes out has got that capability [to do PSO],” Burgener cites Hifn (NASDAQ: HIFN) and greenBytes as two players to watch in the PSO space.
With its Express DR family of PCI Express cards, Hifn OEM customers have a choice of using an inline or a post-processing approach to PSO, said Burgener, making the solution attractive to organizations with virtual tape libraries (VTLs) and backup appliances.
As for greenBytes, its new, just-coming-out-of-beta Cypress NAS, with its Sun/ZFS+-based approach, make it very appealing to Solaris users who are running ZFS, he said. “It also might make greenBytes an attractive acquisition candidate for Sun,” he added.
Benefits of PSO
The main benefit of PSO is that it reduces the overall space and power required for primary storage. Primary Storage Optimization also “shortens overall backup-and-restore times, since less data must be written to or retrieved from disk for any given data set,” explained Burgener, and, “in cases where data sets must be shipped across networks, the smaller, capacity-optimized data sets require less bandwidth, thereby reducing network traffic.”
PSO can also be used with Secondary Storage Optimization solutions, oftentimes resulting in a significant overall reduction in space and power consumption. Though as Burgener cautioned, “data reduction ratios with combined use will vary based on the actual solutions used and the workload types. The only way to really understand the benefit PSO, or a combination of PSO and SSO together, will provide is to test it on specific workloads.”
The Pitfalls
As with other storage technologies, there is a performance versus capacity trade-off with PSO. “Access latency is a problem that is a real concern for primary storage, though not so much for secondary storage,” said Burgener.
“In-line approaches (the Storwize approach) have to deal with this; it’s less of an issue for post-processing approaches (the Ocarina approach), but the issue with post-processing is that it will definitely require more storage capacity,” he said. “How much more depends on what schedule the post-processing is run on (e.g., within hours of writing the data, within days of writing it, or within months of writing it, etc.).”
That’s why, he noted, it’s important for storage and network administrators to understand their organization’s particular storage challenges and weigh the pros and cons of each approach before choosing a PSO solution.
The Future’s So Bright
Over the next 12 months, Burgener sees the adoption rates for both primary and secondary storage optimization solutions accelerating. “There are still a lot of end users who have heard about the technology but don’t really understand how it works yet, and as more and more vendors [like EMC, IBM and Symantec] get into the space, they’re going to hear more about this.”
Indeed, from what Burgener has seen happening in the storage industry, and the economic arguments for doing storage optimization being so compelling, he believes that PSO is almost a forgone conclusion. “Why would you spend 10 or 20 times as much to store a piece of data if you don’t have to? And there’s no risk associated with doing capacity optimization. I think we’re going to see this penetrate rapidly.”
“It’s not there right now,” he said, but over the course of the next 12 months he expects PSO adoptions to increase, though “we don’t think the market is going to be as large as the secondary side, just because there’s a lot less data.”
Burgener also predicts that within the next 12 months or so, we’re “going to see one or two of the WADS vendors [like Cisco or Riverbed] make public comments that are going to put them in direct competition with people like Storwize and Ocarina in the PSO space,” as well as some industry consolidation as some of the larger players snap up the smaller, more specialized vendors (such as Sun Microsystems acquiring greenBytes).