Interoperability: The Trials and Tribulations of Heterogeneous Installations Page 2
The number of FC switch vendors with large port counts is relatively small. These switches are often call director class switches. Brocade, Inrange, McDATA, and now Cisco are the only vendors I am aware of that currently support over 64 ports. Most of these switches support most of the HBAs and most of the RAID devices.
Tapes are an issue for some switches, but the key word of caution is the word “most.” You will need to look at the switch vendor’s interoperability matrix for the switch, HBA, RAID, and tape devices to find out what works with what. Then comes the really fun issue of ensuring compatibility with driver and firmware releases for the HBAs, tapes, and RAIDs. Just because it worked with driver and firmware release XYZ does not mean the switch vendor will support a different driver and firmware release that might be required by a RAID or tape vendor.
Tracking switch errors is even more fun than tracking HBA errors. In most cases, to get the level of detail required for debugging, you need to connect to the switch via a vendor-supplied GUI. Some vendors provide SNMP export, but to get the really detailed information it is necessary to log in to the switch with the GUI. Issues such as CRC errors and low-level Fibre Channel errors must be monitored.
The number of RAID vendors continues to grow, and with that comes interoperability issues. Some vendors have a huge investment in large interoperability labs that test operating systems, servers, HBAs, and switches. Most of the major vendors support a large matrix of the above hardware. Of course, Linux has become an issue for many of these vendors, and the flavor of Linux supported for both the RAID and the RAID interface software (GUI) in some cases is an important consideration for both your site and the vendor testing.
RAID errors usually fall into three categories:
- Fibre Channel and/or SCSI errors between the RAID controller and switch
- RAID hard drive errors that might result in write reconstruct
- RAID backend Fibre Channel errors
The RAID errors between the switch and RAID controller are generally passed back to the server via error control within the HBA, but that doesn't mean that you always get everything you want. Whatever type of error you are receiving, more than likely you will need to view these errors via the management console or GUI provided by the RAID vendor. Server vendors that also sell storage sometimes pass the errors back to the system log, as they can integrate everything given that they control the OS.
Everything said above about RAID interoperability goes double or triple for tape. Making sure that everything works end-to-end is often very difficult. Issues with what tape firmware, HBA firmware, and HBA settings are huge. Setting up these tests is also very time consuming, and error injection is just plain hard. Buying HBAs from the tape vendor is a good idea just to have a finger to point and to make sure that you install the driver and firmware versions that they support.
As with RAID, multiple types of errors can occur, including:
- Tape drive errors at the SCSI or Fibre Channel layer
- Media errors within the tape itself
In most cases, media errors are passed back to the application using the tape drive and written to the system log file. As with RAID, tape drive errors are usually also passed back to the server side and written to the system log. Most drive vendors have SCSI pass-through commands that can be issued to get drive statistics and error conditions. These pass-through commands can get information or set up specific information within the tape drive and are “passed through” the system to the drive itself, not writing any data.