Recently an industry ruckus was stirred by an announcement made by Spirent, a network performance analysis and test manufacturer, which reported it had run a new test on Fibre Channel fabric switches that none could pass. Spirent’s test simulated the maximum industry standard of 239 switches in a single Fibre Channel fabric.
None of the switch vendors, it turned out, could actually support the standard, even though some had specified such support in their product data sheets. Incensed that Spirent could perform such an unfair test, the Fibre Channel Industry Association returned fire, stating that Spirent’s test had nothing to do with real world conditions and that Spirent was simply trying to sell more test software.
Hypocrites! declared Spirent, pointing out that the Fibre Channel emperors were in fact stark naked when it came to inflated claims.
Opportunists! responded the FCIA, for attempting to turn a non-issue into a means to sell test suites.
As it turns out, though, this heated exchange is really much ado about nothing, especially in light of new SAN routing technology on the market.
The magic number of 239 switches in a single fabric is the product of the address space for Fibre Channel fabric domains — 8 bits, or 256 possible addresses, minus 17 reserved addresses. That leaves 239 possible unique addresses that could be assigned to different fabric switches in a single, flat network.
This theoretical maximum has little to do with day-to-day storage area networking, though, and in practice most vendors cannot reasonably support more than 20 or so switches in a single fabric. So while Spirent’s test suite has little practical value, as defensively stated by the FCIA, neither do vendor data sheet claims of supporting 239 switches. Egg on both your faces, an impartial judge would declare.
Of Honesty and Hypocrisy
In the early days of Fibre Channel technology (7 years or so ago), I received a call from a storage vendor about arbitrated loop technology. The standard for arbitrated loop specifies up to 126 end devices per loop, with up to a 10 km segment per device.
The inquirer asked if we had actually tested this theoretical maximum, to which I replied, of course not, and to which the inquirer responded with shock and indignation. How could we possibly sell a product that had not been tested to the maximum standard? It was the sheerest hypocrisy on our part.
Obviously, in the real world, no one would run an arbitrated loop that was a total of 2,530 km (126 x 20 km in both directions per device) in circumference. If it worked at all, it would have had pathetic response time, and a test to demonstrate this maximum would have been pointless. To the inquirer’s credit, however, support for 126 devices at the maximum of 10 km per device, even if valid by standard, should not have appeared on vendor data sheets.
The problem, though, is the first vendor that's truly honest is easy prey to the conscience-challenged competition. For Fibre Channel fabric switches, for example, the data sheet should really just specify the number of switches actually tested and supported in a single fabric (e.g., 24).
But since there's no law preventing other vendors from listing the theoretical maximum, and because a ‘24‛ simply doesn't look all that impressive compared to a ‘239,‛ the gross exaggerations of one vendor forces all others into opportunist submission: we all support 239 switches in a fabric. Large asterisk. Read the best practices manual.
The Limits of Switch Scalability
Although no real world customer (unlike certain performance analysis vendors) has proven crazy enough to attempt to connect 239 switches together, large enterprise customers often need to scale beyond 20, 30, or 40 switches. This has proven problematic and often impossible for several reasons.
First, Fibre Channel is a flat, layer two network, much like a bridged LAN. It is deliberately flat, since a layer two (link layer) network offers optimum performance, and optimum performance is what channels are all about. But because a fabric is flat, everything sees everything, and more importantly, any device can affect any other device (or all other devices) in the network.
As the number of switches in a single fabric increases, all the switches must engage in fabric-building processes, principal switch selection, exchange of Simple Name Server (SNS) information, routing tables, and state change registrations. This quickly engenders excessive fabric convergence times, often lasting tens of minutes or more. In most vendor implementations, the memory assigned to SNS tables is simply insufficient to support more than a dozen heavily populated switches in a single environment.
Second, supposing 10-20 switches can be brought up and stabilized into a single working fabric, the fabric is always vulnerable to disruptive fabric reconfigurations or state change notification broadcast storms. This can be highly disruptive to ongoing storage transactions and can incur SAN outages if the fabric has to undergo a compete rebuild.
SAN Routing Technology: Much Ado About Something
While some customers struggle to build very large fabrics with existing products, a more fundamental question is, why? What is the desired result that makes some customers feel that 239 switches in a single fabric is something they might want? Typically, large fabrics are not created so that any storage device or host can talk to any other storage device.
Networking provides any-to-any connectivity, but applications usually only need one-to-one or one-to-a-few (e.g., a server cluster to a single storage target). What customers that are trying to build large SAN fabrics are typically attempting to do is share a large and expensive storage asset such as a tape library between a larger number of storage devices — a many-to-one solution.
With native Fibre Channel, this many-to-one result can only be achieved by connecting all fabric switches together, and paying the penalty in terms of convergence time, SNS limitation, and exposure to fabric-wide disruptions.
New SAN routing technology, by contrast, can achieve many-to-one connections by allowing multiple SAN switches to exist as separate fabrics, with only authorized connections between them for assigned storage assets. First pioneered by the company formerly known as Nishan (which was recently acquired by McDATA), this solution filters fabric building protocols, eliminates SNS sizing issues, blocks fabric reconfigurations, and isolates faults. Additionally, it overcomes even the theoretical limit of 239 connected (in this case, routed) switches in a storage network.
With this new capability, exaggerated claims on data sheets are no longer required, nor are expensive test programs to expose them. Now, that’s much ado about something.