Interoperability: The Trials and Tribulations of Heterogeneous Installations
This month I'm going to review some personal experiences with the installation, configuration, debugging, and operation of heterogeneous hardware and software. Of course, because of non-disclosure agreements (NDAs) in place with various organizations, the company names will remain anonymous to protect both the guilty and innocent...and me.
Over the last few years I have been involved with multiple shared file systems designs and installations with Fibre Channel, HBAs, RAID, tapes, and switches. You would think that with all the interoperability testing going on today, everything would be "plug and play" as with Windows XP. In most cases, it works about as well as Windows 2000 plug and play and, in some cases, as well as Windows 98 (sort-of kind-of with a number of gotchas).
Some people start off with an idea for an architecture based on a few PowerPoint slides from a vendor or two and add some of their own ideas based on internal requirements. Based on this, an architecture is defined, often with software and hardware from different vendors. PowerPoint engineering, as this type of engineering is often called, sometimes works and sometimes does not work very well. It then becomes someone's job to make the grandiose ideas work.
Not that I haven't created systems using this method. I call it the "Chinese a Menu" method where you get one or more choices per column. The difference is that I never expect these systems to work without a great deal of hard work, and we always plan extensive integration time to ensure we can get them to operate as expected. And sometimes even the best architectures cannot be made to work without changes from the hardware or software vendor.
Step One: Hardware
In complex architectures, picking hardware that works together is not always as easy as it seems. Severs, HBAs, FC Switches, RAIDs, and tapes from vendors have a matrix of what the vendor certifies interoperable with what other vendor products.
You might have a server that works with a certain HBA, and that HBA is certified with a specific FC switch and RAID, but your tape drives are certified with a different FC switch and/or different HBA. Fibre Channel tapes generally add more complexity to choosing hardware, given that tape error recovery is different and more complicated than RAID or disk error recovery. SCSI-connected tapes add additional issues if they are connected via a Fibre Channel-to-SCSI converter.
Finding error issues with HBAs is never any fun at all. The meanings of the errors that are tracked are often not well defined. Some errors messages are sent to system logs, but others are often only tracked within the HBA driver. Sometimes a GUI can be used to browse these error conditions. I have seen where driver configuration files are used to set the level of error tracking within the HBA driver and output to either the system log or HBA GUI.