At Oracle’s annual OpenWorld conference last month in San Francisco, Larry Ellison announced the Exalogic “cloud-in-a-box” system aimed at the application market. Whether the industry will embrace Exalogic remains to be seen, but there are users who have logged many hours with Oracle’s other solution-in-a-box – Exadata – and we now have a gauge on just how well it has been received by end users.
“Oracle Exadata relies on the Sun ZFS storage system and is powered by Sun servers and storage,” said Greg Schulz, an analyst with StorageIO Group. “Exadata version II is optimized, from a hardware perspective, to support Oracle databases, which are part of that solution bundle.”
The first version of Exadata was based on Oracle (NASDAQ: ORCL) database running on HP (NYSE: HPQ) servers and storage to meet similar target markets. Exadata II was the first major product born of the union between Sun and Oracle. The response to Exadata II seems to be positive.
LinkShare of New York City, for instance, is a provider of full-service online marketing solutions specializing in the areas of search (SEM), lead generation and affiliate marketing. It has been running a couple of Exadata II boxes since their release in March of 2010. LinkShare migrated data from a legacy database infrastructure over a period of a couple of months, tuned the system for performance and went live in July. Users – mostly external customers – were moved over to Exadata gradually to the point where all resided on that box as the home for one of LinkShare’s key analytical applications. Now the company is moving new users onto a second Exadata unit.
“Our core business is software-as-a-service (SaaS) that lets advertisers and publishers negotiate terms, collect data and handle payments,” said Jonathan Levine, COO of LinkShare. “We utilize Exadata to provide analytical support for these core applications.”
LinkShare’s primary data sits on a series of Intel boxes running Oracle 10 on Red Hat Linux. That transactional system is replicated to Exadata, which acts as a large data warehouse and analytics engine. LinkShare wrote an application on top of Oracle Business Intelligence Enterprise Edition (OBIEE) to give users and customers access to all the reports they need.
Levine explained that it is a challenge to do reporting on the same database that is used for transactional processing. Over the years, the company has tried various solutions to this problem, including use of a completely separate analytical system and then a separate database on clustered servers.
“That worked well enough for a while, but as time neared for a refresh, we realized we were reaching its limits,” said Levine.
As everything was spread amongst Intel boxes running Linux, data was severely fragmented. The overhead from fragmentation was saturating system interconnects. He also complained of a Christmas tree effect – if one server went down, the whole cluster followed.
“We would have wound up with eight racks of Intel servers and we lacked confidence that it would function adequately,” said Levine. “As we were based in Manhattan, we couldn’t find a co-location vendor to run all those racks due to power constraints.”
In comparison, he was attracted to Exadata’s relatively small footprint for both space and power. He also liked the promise of no finger pointing. If anything happened to the box, he only had one number to call.
In terms of benefits, he moved from four racks into a half rack and from four times 13 kV to 6.6. The Flash-based cache in Exadata has also provided an eight-to-ten times performance boost, and, Levine said, that is exactly what customers have been demanding.
His customers want real-time data access around the clock. That makes it impossible to set up downtime windows for data loading or backup in the middle of the night. In the old days, they tried to run data loading and reporting at the same time. Result: both became sluggish.
“Exadata’s data loading is constant regardless of the reporting load,” said Levine. “If you keep customers waiting even a short while, you lose them or lose productivity.”
He believes there is a four second window for getting data to a customer from a query before their mind wanders. He said Exadata has reduced the latency window from about eight seconds down to less than four.
“The system feels interactive now and our customers are very positive,” said Levine. “The rate of uptake of our products has also increased – we have more users using it more actively than before.”
Levine firmly recommends an all-in-one, optimized system versus traditional storage architectures, especially for data uploading tasks.
“Using traditional dumb storage for databases is not tenable given a large volume of queries,” said Levine. “It just can’t compete with storage that is SQL-aware like Exadata.”
Speeding Up Database Queries
Another satisfied Exadata user is RL Polk, a provider of automotive data. These are the guys that provide the market intelligence to the vehicle industry such as who owns every vehicle in the U.S., what they use it for, how it was financed, insured and more. Several million transactions per day run through RL Polk’s databases.
Like LinkShare, RL Polk has a half-rack Exadata II unit purchased at the end of May and installed in July.
“We are using Exadata for every customer-facing data mart we have,” said Doug Miller, director of global Database development and operations for RL Polk.
RL Polk is throwing Exadata right onto the front lines by launching it on its flagship Insight product. Each automotive manufacturer has between 50 and 200 analysts with permission to access the data warehouse, where fresh data is published about twice a month.
“In the hour [the data] is released, you get about 2,000 logins and anywhere from 50 to 100 warehouse queries running concurrently,” said Miller.
Years ago, those queries went into a queue and analysts might not see a response for 24 hours. That led the company to introduce real-time querying which brought most queries down to minutes. Customers loved that five years ago, but not anymore.
RL Polk looked into how to get data warehouse queries below a 20 second response time – no easy task when you consider that most queries have to analyze vast amounts of information. Polk checked out the Netezza data warehouse appliance as a means of achieving massively parallel operations. However, integration issues with its Oracle applications nixed the sale. The company evaluated Teradata and was also looking at upgrading its EMC SAN as a possible solution before opting for Exadata.
“Using a SAN and database servers, you will never get the performance you need on queries,” said Miller.
With the SAN up for a refresh, Miller directed management to invest instead in Exadata II.
“It is a quantum leap over the Exadata 1 due to fast cache and compression,” said Miller. “We got with Oracle on a cost analysis and management saw it was a no-brainer.”
While [Exadata] may seem expensive, he said, you have to look at the entire cost equation. When you consider that you have to buy disk, SAN switches, and other gear then fit it all together, buy database servers, run racks and bring in interconnects, the price tag soon adds up for a traditional storage solution.
He said installation was a piece of cake. He plugged it in and attained what he called “massively better performance” without changing anything on Oracle. The footprint shrunk from four full racks for Symmetrix to half a rack for Exadata.
“We have 20 years of data but we never change it,” said Miller. “We can compress it by 10X and still query it fast.”
Almost in production, RL Polk has been conducting tests on its Exadata system. About 80 percent of database queries completed in ten seconds or less, compared to up to four minutes previously. Another 20 percent run a little faster than before, as they are far more complex.
“Exadata scans database tables and material really fast,” said Miller. “Flash is huge – we can put 2.5TB into Flash, which makes scans almost instantaneous.”
If RL Polk wants to make sure a specific data set gets into Flash, a quick script is all that is required. That lets the company be proactive in putting data into Flash when it knows queries are coming on newly published material.
“Exadata has seven storage cells, with each cell having 12 hard drives and 385GB of flash,” said Miller. “So queries can be fulfilled very fast due to the massively parallel architecture.”
EMC Gets in the Game
Netezza and Teradata aren’t Oracle’s only competition. EMC recently threw its hat into the ring.
Less than three months after its acquisition of Greenplum, EMC (NYSE: EMC) rolled out a new Data Computing Appliance tuned for data loading and analysis that the company claims can crunch “big data” faster than other systems at a lower cost.
Built using Greenplum Database 4.0 and touts a “parallel-everything” architecture that can deliver data loading performance of 10TB per hour. Ben Werther, director of product strategy for EMC’s Data Computing Products Division and formerly the head of product management for Greenplum, said the Greenplum Data Computing Appliance is architecturally different than Exadata. “Exadata was designed for OLTP, where there are many queries against the same data and a lot of contention on that data. Every Exadata node sees all of the storage and scalability doesn’t really go beyond 16 nodes,” he said.
Conversely, he said, Greenplum’s massively parallel processing (MPP), shared-nothing architecture sees each node as a separate processing element of the cluster. Each node has its own, dedicated disk resources, eliminating disk contention. Werther said the database communicates via global querying, allowing for linear scalability.
Drew Robb is a freelance writer specializing in technology and engineering. Currently living in California, he is originally from Scotland, where he received a degree in geology and geography from the University of Strathclyde. He is the author of Server Disk Management in a Windows Environment (CRC Press).
Follow Enterprise Storage Forum on Twitter.