Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
RL Polk is throwing Exadata right onto the front lines by launching it on its flagship Insight product. Each automotive manufacturer has between 50 and 200 analysts with permission to access the data warehouse, where fresh data is published about twice a month.
In the hour [the data] is released, you get about 2,000 logins and anywhere from 50 to 100 warehouse queries running concurrently, said Miller.
Years ago, those queries went into a queue and analysts might not see a response for 24 hours. That led the company to introduce real-time querying which brought most queries down to minutes. Customers loved that five years ago, but not anymore.
RL Polk looked into how to get data warehouse queries below a 20 second response time no easy task when you consider that most queries have to analyze vast amounts of information. Polk checked out the Netezza data warehouse appliance as a means of achieving massively parallel operations. However, integration issues with its Oracle applications nixed the sale. The company evaluated Teradata and was also looking at upgrading its EMC SAN as a possible solution before opting for Exadata.
Using a SAN and database servers, you will never get the performance you need on queries, said Miller.
With the SAN up for a refresh, Miller directed management to invest instead in Exadata II.
It is a quantum leap over the Exadata 1 due to fast cache and compression, said Miller. We got with Oracle on a cost analysis and management saw it was a no-brainer.
While [Exadata] may seem expensive, he said, you have to look at the entire cost equation. When you consider that you have to buy disk, SAN switches, and other gear then fit it all together, buy database servers, run racks and bring in interconnects, the price tag soon adds up for a traditional storage solution.
He said installation was a piece of cake. He plugged it in and attained what he called massively better performance without changing anything on Oracle. The footprint shrunk from four full racks for Symmetrix to half a rack for Exadata.
We have 20 years of data but we never change it, said Miller. We can compress it by 10X and still query it fast.
Almost in production, RL Polk has been conducting tests on its Exadata system. About 80 percent of database queries completed in ten seconds or less, compared to up to four minutes previously. Another 20 percent run a little faster than before, as they are far more complex.
Exadata scans database tables and material really fast, said Miller. Flash is huge we can put 2.5TB into Flash, which makes scans almost instantaneous.
If RL Polk wants to make sure a specific data set gets into Flash, a quick script is all that is required. That lets the company be proactive in putting data into Flash when it knows queries are coming on newly published material.
Exadata has seven storage cells, with each cell having 12 hard drives and 385GB of flash, said Miller. So queries can be fulfilled very fast due to the massively parallel architecture.
EMC Gets in the Game
Netezza and Teradata arent Oracles only competition. EMC recently threw its hat into the ring.
Less than three months after its acquisition of Greenplum, EMC (NYSE: EMC) rolled out a new Data Computing Appliance tuned for data loading and analysis that the company claims can crunch big data faster than other systems at a lower cost.
Built using Greenplum Database 4.0 and touts a parallel-everything architecture that can deliver data loading performance of 10TB per hour. Ben Werther, director of product strategy for EMCs Data Computing Products Division and formerly the head of product management for Greenplum, said the Greenplum Data Computing Appliance is architecturally different than Exadata. Exadata was designed for OLTP, where there are many queries against the same data and a lot of contention on that data. Every Exadata node sees all of the storage and scalability doesnt really go beyond 16 nodes, he said.
Conversely, he said, Greenplums massively parallel processing (MPP), shared-nothing architecture sees each node as a separate processing element of the cluster. Each node has its own, dedicated disk resources, eliminating disk contention. Werther said the database communicates via global querying, allowing for linear scalability.
Drew Robb is a freelance writer specializing in technology and engineering. Currently living in California, he is originally from Scotland, where he received a degree in geology and geography from the University of Strathclyde. He is the author of Server Disk Management in a Windows Environment (CRC Press).
Follow Enterprise Storage Forum on Twitter.