A great deal has been written about Fibre Channel directors in a number of reputable publications, but how exactly is a FC director-class switch different than a standard switch? The answer is not as clear as you might think. The place I usually go for Fibre Channel definitions is:
The Storage Networking Industry Association (SNIA) is known as the place to go for definitions of common and esoteric terms for storage. These definitions are agreed upon by the SNIA membership, so in effect they are defacto terms the industry has defined and uses. Unfortunately, this was strike one, as SNIA at this time has no definition in place for a Fibre Channel director. So next I performed a GOOGLE search using:
The result was a number of hits — some useful, some useless; here is one site that had an actual definition:
I had actually never heard of this site, but here is how they define a director-class switch:
A fault-tolerant Fibre Channel switch that typically has a high port count and may serve as a central switch to other fabrics.
Of course, since this definition is about as clear as the Mississippi River in New Orleans after a spring flood, I thought it might be useful to review some of the issues and considerations with director-class switches. First, though, it's important to note that:
- The SNIA members do not want to commit to a definition for director-class switches at this time
- The TechWeb definition and a few others that I have found (many of them vendor-based) leave a huge amount of wiggle room for the vendors and are often contradictory
So What Exactly Is a Director?
Since the definitions I've found have been less than specific, I'm going to identify some of the characteristics I believe are important when you move from a garden variety Fibre Channel switch to a Fibre Channel director given the cost difference per port.
- Hot upgradeable firmware
- For the switch – This might require a switch reboot at some future date
- For the control module
- Hot changeable with no impact rezoning of the switch (note that this can be a RAID or HBA issue, as they must relogin to the switch)
- Dual everything
- Power supplies
- Control modules (hardware/software module that allows management and monitoring)
- Hot pluggable everything (the backplane should be the only exception)
- Power Supplies
- Control modules
- At least 64 ports and hopefully over 100 ports
- LAN FC to WAN IP blades
In addition, these director-class switches should be able to achieve the almost mythical 5 9s of uptime, or about 5 minutes of downtime per year, but this brings to mind a few questions:
- Does every vendor define the term director-class switch the same way?
- Do the definitions used by the majority of vendors use the same criteria defined above?
Both answers are no, and in thinking about the issues, an additional set of questions arises. What are the performance requirements for director-class switches, and are the requirements I defined for director-class switch necessary in all cases?
All I can say about vendors is that there's often a lack of full disclosure of performance information when it comes to their switches. A Google search on a specific vendor, model, and performance will likely provide some interesting results for some vendors.
In the environment I work in, we require 2 Gb full duplex (200 MB/sec read and 200 MB/sec write) from any port to any port. As all director-class switches are composed of blades, we might connect from the host to a blade within the switch and then connect that blade to one RAID for write and to a different blade and another RAID for read. Each blade connection and the blade-to-backplane communication should support full duplex and consistent I/O performance, and this connectivity must work along with the other case of running the blade at full duplex rates.
Our working environment precludes us from affording the requirement of specifically connecting certain RAIDs and hosts to certain blades. This presents a few problems for my work, as we require specific data rates for streams of I/O. These problems include:
- Over time you have failures of ports and need to be able to keep the same performance regardless of the port configuration
- Requiring specific port configuration adds complexity to operational systems, especially at 3 AM when a port inevitably decides to fail
So what is the performance requirement for your site?
Defining the amount of performance required means you need to understand your application load, the application I/O request sizes, and the hardware involved in your environment. For example, even if you have a 2 Gb mid-range RAID and your applications are writing large block streaming I/O, if you are using the RAID write cache mirror feature, your I/O performance is limited in most cases to about ~140 MB/sec or so depending on the vendor. To really understand your performance requirements you need to look at:
- The application I/O requirements and how the application performs I/O. A database, for example, is not likely move data anywhere near the 2 Gb full duplex rates, given that the I/O request sizes are generally small and random for indexes,, and table space access is still not large block streaming I/O
- The HBAs need to be high performance 2 Gb and tuned for high performance I/O (see http://www.enterprisestorageforum.com/technology/features/article.php/1569961)
- The RAIDs need to support and be tuned for high performance streaming I/O
So basically, from the performance point of view, unless you have an HPC application and environment, performance is not likely to be an issue for most director-class switches, even though some of them do not run at full rate.
Is a Director-class Switch Needed?
Most sites using director-class switches are doing so because they have both very high port count and very high reliability requirements. The high port count for director switches allows for simplified management, and the reliability compared with non-director-class switches speaks for itself. Of course, the cost is greater per port with director-class switches, but the costs associated with downtime and the time needed for personnel to manage many smaller switches are expensive as well.
The other major reason director-class switches are used is in large configurations of servers, HBAs, and RAID devices that require high performance. Given that enterprise RAIDs from EMC, HDS, and IBM support as many as 64 connections, it's not necessary to use director-class switches unless the configurations are large or for mid-range RAIDs that have only a few number of ports and failover requirements. Consider the following example:
This is a common HA (High Availability) configuration used with mid-range RAID controllers that support active/passive failover. Of course, this is a very small configuration and could be dramatically expanded. Keep in mind that an HBA's reliability based on BELLCORE numbers (a standard reliability calculation) is usually between 250K and 400K hours. So with 32 HBAs and using the 250K hour term, you can expect 1 failure per year. This means that unless you have hotswap PCI/PCI-X slots, you will need to take the server down.
Taking all of this into account, I believe it is a good idea to consider a director-class switch if:
- You need very high reliability
- Your downtime requirements for upgrades are very narrow and the director-class switches allow hot upgrades
- You have a large port count requirement
- You have limited personnel that can manage and upgrade many small switches
If you need very high availability access to the storage, the above configuration using two director-class switches connected to the storage system (whether it is mid-range RAID or enterprise RAID) should exceed the 5 9s of uptime and data availability.
Enterprise-class Fibre Channel directors should be compared with the core IP routers and switches at any site. Access to the Internet and the other machines in your network is useless unless you can access your data. From what I have seen, many managers do not look at things this way. They focus on the cost difference between the two switch types without thinking about the data access issues and without closely looking at the feature list comparison.
Part of the problem is the lack of a definition for these switches, which does not help the consumer nor, in my opinion, the vendors, as they have to sell against each other without even a baseline set of features and functions. Perhaps it's because the concept of a Fibre Channel director is far newer than the concept of an Internet core switch or router. In any event, this leads to problems for everyone involved.
When it comes to performance for director-class switches, you must clearly define your application performance. Two-gigabit full duplex performance from any port to any port is not always available in these switches, but aside from all but a few applications, is this even necessary?
Two areas important to consider are tape drives and backup or HSM applications. Most of the time from what I have seen, tapes are configured with one HBA trying to write/read the data at full data rate. Take, for example, the STK T9940B tape drive with a rated uncompressed speed at 30 MB/sec and a rated compressed speed of ~70 MB/sec. Even though the database system might not ever have 2 Gb performance issues for the database activity, performance could become an issue for the backups. Or if you have high data compression and have bound 3 tape drives to an HBA using different parts of the switch, performance could become a factor in this case as well.
Asking vendors hard questions about the features and functions that you want and need is your responsibility, as there really are no standards in place at this point. In other words, Caveat Emptor – Let the Buyer Beware.