Less than three months after its acquisition of Greenplum, EMC has rolled out a new Data Computing Appliance tuned for data loading and analysis that the company claims can crunch “big data” faster than other systems at a lower cost.
The new EMC (NYSE: EMC) Greenplum Data Computing Appliance was built using Greenplum Database 4.0 and touts a “parallel-everything” architecture that can deliver data loading performance of 10TB per hour, according to the company.
The Data Computing Appliance integrates database, compute, storage and network into a single system, and is available in half-rack, full-rack, and multiple-rack appliance configurations for terabyte to petabyte-scale deployments.
Data loading is not a big deal for most users, but Ben Werther, director of product strategy for EMC’s Data Computing Products Division and formerly the head of product management for Greenplum, said maintaining data pipelines is difficult once you reach a certain size. “The source can become the bottleneck and when that happens you are doing triage and playing catch up. In our model, all of the pieces of the database request data chunks, spreading the load across all of the nodes of the system and fully parallelizing loads from any source,” he said.
Werther said competing products, namely Oracle’s (NASDAQ: ORCL) Exadata Database Machine, are architecturally different than the Greenplum appliance. “Exadata was designed for OLTP, where there are many queries against the same data and a lot of contention on that data. Every Exadata node sees all of the storage and scalability doesn’t really go beyond 16 nodes,” he said.
Conversely, EMC Greenplum’s massively parallel processing (MPP), shared-nothing architecture sees each node as a separate processing element of the cluster. Each node has its own, dedicated disk resources, eliminating disk contention. Werther said the database communicates via global querying, allowing for linear scalability.
The Data Computing Appliance features a native backup utility that is integrated with EMC’s Data Domain backup appliances for data protection and deduplication. The system also supports EMC RecoverPoint for disaster recovery via replication between two EMC Clariion CX4 960 platforms over the WAN or SAN.
Greenplum users can start with a half-rack or single rack configuration and can scale up to 24 racks for a maximum capacity of 5PB (compressed). The starting price for the EMC Greenplum Data Computing Appliance is $1 million.
Greenplum Database 4.0 is also shipping as a standalone, licensed software product for deployment on industry standard x86 hardware.
EMC has thrown resources behind its new Data Computing Products Division (created post-Greenplum acquisition) as it has increased personnel by about 30 percent as it ramps up its efforts to take on Oracle. The Division has been given a mandate: get more insight into data than is offered by the traditional storage view and push more computation and analysis into that storage.
Greenplum has officially been part of EMC for about 75 days. In that time, EMC has released a new version of the Greenplum database software (4.0), updated industry-standard hardware reference architectures for the Greenplum Database, and managed to design and build the new Data Computing Appliance.
“Since the acquisition, we have been focused on the idea of private cloud computing and being able to bring compute and storage together in a powerful way for large volumes of data with analytics needs,” said Werther.
The Data Computing Products Division is now shipping three products with a fourth on the horizon. The roster of available products includes the Greenplum Data Computing Appliance, Greenplum Database software and Greenplum Database Single-Node Edition. Greenplum Chorus – dubbed by EMC as an Enterprise Data Cloud platform – is not yet generally available.
Follow Enterprise Storage Forum on Twitter.