What do storage solutions from 3PAR, BlueArc, EqualLogic, Exanet, Isilon, LeftHand, NetApp, ONStor, Panasas, Polyserve and SGI have in common? If your answer is that they represent different implementations of clustered storage, you are correct. Similar to clustered servers, clustered storage has many different meanings and implementations that can address various application and business requirements.
Clustering is a proven technique to support scale-out of performance, capacity, reliability and availability of servers and storage resources beyond the limits of a single device. Traditional storage systems are bound by their physical components (number of disk drives, attached servers, cache size and controller performance) along with functionality and logical constraints (number of file systems supported, number of snapshots or replications, among others). These predefined real or artificial boundaries force users to upgrade to larger storage systems and more management tools once storage system limits are met.
Common clustering scaling characteristics include:
- Performance (bandwidth, IOPs or some combination) tuned for large sequential read or write operations or time-sensitive random reads and write, transaction-oriented;
- Availability — eliminating single points of failure, transparent failover or self-healing capabilities;
- Storage capacity and server access connectivity (Fibre Channel, Ethernet, InfiniBand ports);
- Accessibility in terms of block (iSCSI, Fibre Channel, InfiniBand) or networked attached storage (NAS) file (NFS, CIFS or proprietary) and data sharing;
- Shared nothing, shared something or shared everything architectures based upon open or proprietary hardware and software using tightly or loosely coupled interconnects.
There are many different types of clustered storage solutions (see Figure 1 below), including clustered and parallel file systems, clustered file servers, clustered NAS, clustered iSCSI and Fibre Channel storage, among others. Most cluster implementations address capacity and availability. Some clustered storage solutions also support scaling of performance in terms of throughput as well as I/O operations, along with simplifying ease of use and management.
(SAN FC/iSCSI) Storage
Clustered File (NAS) Storage
Clustered and Parallel File Systems
|Access||iSCSI or FCP||NFS, CIFS||NFS, CIFS, HTTP, Other|
|Characteristic||Scalable performance, capacity or availability||Virtual NAS servers, scalable performance||Host software or appliance based|
|Benefits and Caveats||Microsoft Exchange, SQL and other time sensitive block based applications||General purpose file sharing and related applications needs||May require special software or propriety hardware. Good for large bandwidth applications|
|Examples||EqualLogic, LeftHand Networks, 3PAR||NetApp, ONStor, BlueArc||Ibrix, PolyServe, Isilon, Verari, Panasas, Lustre, SGI, Crosswalk|
Figure-1: Many faces and examples of clustered storage
The capabilities of clustered storage vary in terms of their ability to scale capacity, performance, accessibility (block or file), availability and ease of use. Clustered storage is not exclusive to the large sequential bandwidth or parallel file system access associated with high performance computing (HPC) environments. General-purpose clustered storage supports traditional business applications such as e-mail, database, and online transaction processing (OLTP), among others.
Diverse applications and environments of all sizes (see Figure 2 below) can benefit from the scalability (performance, capacity, availability, modularity) and abstraction (virtualization) capabilities provided by flexible clustered storage systems. For example, a small or medium-sized business (SMB) or small to mid-sized enterprise (SME) environment can initially deploy a small multi-node storage system to meet specific application requirements while transparently increasing performance, capacity and functionality to grow as needed.
|Figure-2: Addressing different application performance and service requirements|
Eliminating single points of failure is important to enhancing data availability, accessibility and reliability. Potential cluster solutions should be evaluated for possible single points of failure (SPOF) as well as feature N+1 redundancy, hot swappable component design with self-healing capabilities to identify, isolate and contain faults before they can turn into problems.
There is another class of storage solutions that falls into a grey area as to whether they are a cluster or not, and those are storage systems that have an N+1 architecture for redundancy. In the N+1 architecture model, there are two or more (N) primary I/O nodes or controllers, also known as NAS heads, and a standby or failover node, with examples being EMC Celerra NSX and Pillar Axiom. What makes the N+1 model even more confusing is vendor architecture, including positioning dual controller RAIDarrays or dual NAS head solutions as clusters for availability.
Is a cluster a grid? That depends in part on what your definition of a grid is, whether you consider a grid to be a service, architecture, hardware- or software-based, spanning distance or providing some other capabilities. Consequently, there are many different vendor and industry definitions and opinions as to what constitutes a grid or cluster for server and storage environments. Having worked many years ago for an electric power generating and transmission utility, what I find is often missing in grid discussions — and an essential component — is supervisory control and data acquisition (SCADA) for command and control. So another variable to throw into the grid-cluster debate is what SCADA equivalent functionality exists for transparent command and control of a storage system.
Differentiators between various cluster storage solutions include:
- How nodes are interconnected (loosely or tightly coupled, open or propriety);
- I/O affinity and performance load-balancing across nodes;
- Propriety hardware, open, off-the shelf, or support for third-party servers and storage;
- File sharing, including clustered file system software, host-based agents or drivers;
- Local and remote mirroring or replication, point-in-time (PIT) copy or snapshot;
- Modular storage growth with virtualization for automated load balancing;
- Performance tuned for sequential reads and writes or random updates;
- Distributed lock management and cluster coherency features.
Understanding the different meanings, types and implementations of clustered storage allows you to apply the most applicable solution to the task at hand. Clustered storage is a good fit for growing and diverse environments of all sizes to enable just-in-time storage acquisitions without the complexity of disruptive upgrades or added management complexities.
Greg Schulz is founder and senior analyst of the StorageIO group and author of "Resilient Storage Networks" (Elsevier).
For more storage features, visit Enterprise Storage Forum Special Reports