Polysius has been around for more than 140 years, designing, supplying and installing cement plants and process equipment around the world, yet it's still growing at a 20 to 30 percent clip and so is its data, specifically its CIFS file shares. While the growth of the business is a good thing, keeping up with its data growth has been a major challenge.
"It's a constant battle," said James Krochmal, the IT manager at Polysius. "We're constantly adding disk shelves and disks," which consumes time and money.
So when Polysius's storage vendor, Network Appliance, offered Krochmal the opportunity to try out its new deduplication solution NetApp A-SIS (for advanced single-instance storage) free for 30 days, Krochmal figured he had nothing to lose.
"Even if it was only marginally successful, if it slowed down some of the rampant growth, I wanted to try it," said Krochmal.
No Performance Hit
NetApp made deployment easy. A consultant from a local NetApp reseller had the technology up and running on Poysius's NetApp FAS3020 in 30 minutes, and except for a bit of restructuring to get some of its volumes down, "everything went swimmingly," said Krochmal.
In fact, in less than two months, the NetApp A-SIS deduplication solution had reduced Polysius's redundant data by 47 percent.
"NetApp deduplication technology enabled us to more efficiently store our data and deduplicate all of our primary user data with absolutely no performance hit," said Krochmal. That has allowed him to defer the purchase of additional storage by several months.
Primary vs. Backup
Chris Cummings, NetApp's senior director of data protection solutions, said the ability to deduplicate primary data is what sets NetApp apart from the competition, though he is quick to point out that NetApp A-SIS is able to dedupe whatever data you throw at it, primary or secondary (backup and archiving).
"Our position is that deduplication goes beyond backup," said Cummings. "And Polysius is one of the first public representations of the benefits folks can get from using deduplication in a broader set of use cases."
Interestingly, storage expert Curtis Preston, a vice president at GlassHouse Technologies, doesn't see that much value in deduplicating primary data. "Except for a few edge cases (like VMware images), deduping primary storage doesn't buy you near as much as deduping secondary data," he said. "Backup data has lots of duplicated data, primary usually has just a little."
As for the A-SIS value proposition, Preston sees it as "just one more cool thing about WAFL," the Write Anywhere File Layout file system that NetApp designed for use in its storage appliances, which is leveraged by A-SIS.
Whatever the case, NetApp has so far licensed its A-SIS deduplication technology to more than 500 customers since its introduction in May of this year. And those customers have already deduplicated more than 10 PB of raw storage.
The More Duplicate Data, the Merrier
As for who can benefit from deduplication, particularly NetApp A-SIS, Cummings said "anyone who is looking at disk-based data protection."
NetApp A-SIS deduplication is available on a variety of platforms, including NetApp's NearStore R200 and FAS 2000, 3000 and 6000 models.
The larger the stores of information you need to duplicate, the greater the value or economic benefit of deduplication with NetApp A-SIS, said Cummings. And while NetApp has one customer that is deduplicating about 800 TB of data, most are like Polysius, deduplicating 18 to 20 TB of data.
For Polysius, deduplication has resulted in "a tremendous amount of savings," said Krochmal, in both capital equipment and time. "It's a no-lose solution."
"It's almost like an insurance policy," said Cummings. With NetApp A-SIS, "you're going to get the benefit of deduplication without paying a performance penalty."
That's because NetApp A-SIS deduplication operates with a high degree of granularity. Newly stored data is divided into small blocks. Each block of data has a digital signature, which is compared to all other signatures in the volume. Then if an exact block match exists on the disk volume, the duplicate block is discarded and its disk space is reclaimed, freeing up storage.