Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Hefty Issues for Hefty SANs
Building a large multi-switch SAN requires careful network design to ensure that sufficient bandwidth is allocated between switches to optimize application performance. In addition, safeguarding against failed links requires a meshed design to provide alternate paths through the fabric.
Allocating ISLs for both performance and alternate pathing consumes expensive fabric ports and reduces the total port count for servers and storage targets. So the more fully meshed the fabric, the lower the total productive population of the SAN.
This becomes painfully obvious when customers attempt to build large fabrics with 16- or 32-port fabric switches. In such configurations, a third or more of the total port count may be devoted to ISLs. In general, higher port count directors are far more efficient when scaling to large SANs, since more ports per chassis are available for device attachment. In addition, new 10 Gbps ISL options simplify switch-to-switch connectivity and avoid multi-ISL trunking issues such as potential out-of-order frame delivery.
Building large SANs has several unintended consequences that may affect fabric stability. Due to inherent architectural characteristics of Fibre Channel, as well as specific vendor implementation in products, connecting 8 or more switches in a single fabric may result in erratic behavior. Fibre Channel is a link layer architecture, much like bridged LANs. A layer 2 network gives optimum performance and the lowest protocol overhead, which aligns nicely with performance requirements of block data over a channel.
Connecting multiple fabric switches therefore extends a flat network space that, like bridged LANs, may be vulnerable to network-wide disturbances. In the bridged LAN environment, broadcast storms can negatively impact all attached nodes. In Fibre Channel SANs, the equivalent disruption may be due to state change notification broadcasts and occasional fabric reconfigurations caused by unexpected changes in the fabric (e.g., plugging a live switch into a large operational fabric). As discussed below, SAN routing addresses such large fabric issues via network segmentation.
In addition, as more switches are connected to a single fabric, more switch-to-switch communication is required to properly allocate unique address blocks, resolve zoning information, add entries into simple name server (SNS) tables, and exchange routing information. In some cases, limited SNS capacity may restrict the number of devices that can be supported in a single fabric.
In most cases, as the fabric grows to the 1000+ device count, the convergence time required to stabilize the network may become quite lengthy if a disruption occurs. The switch-to-switch chatter required for initial fabric building and registration of servers and storage devices increases in volume as more switches are added to the fabric. If SNS entries are inadvertently exceeded, the fabric may finally stabilize, but not all devices will be recognized.