Getting Failover Right


Want the latest storage insights?

Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure

Until recently, we had only two choices for host bus adapter (HBA) and switch failover , and those were generally available via the volume manager or a special loadable driver.

Recently, some of the major HBA vendors have released drivers for some operating systems that provide failover within the device driver. Combine this with the fact that some RAID devices add an additional level of complexity to failover, and you now have more choices than you know what to do with to accomplish high-availability (HA) configurations with HBA and switch failover.

To help sort out those choices, let's start by looking at the hardware and how it works. The RAID is likely the most important component to understand because it has the most options and issues for failover. How hosts view and access the LUNs that the RAID presents them is the critical issue.

RAID controllers generally have two different characteristics for access to the LUNs:

  1. Active/Active
  2. Active/Passive
Higher-end and enterprise controllers are always Active/Active. Mid-range and lower-end controllers can be either. How the controller manages internal failover and your server side, software and hardware will have a great deal to do with your choices for accomplishing HBA and switch failover. Before developing a failover or multipathing architecture, you need to fully understand the issues with the RAID controller.

With Active/Active controllers, all LUNs are seen and can be written to by any controller within the RAID. Generally, with these types of RAID controllers, failover is not a problem, since the host can write or read to any path. Basically, all LUN access is equal, and load balancing I/O requests and access to the LUNs in case of switch or HBA failover is simple. All you have to do is write to the LUN from a different HBA path.

Active/Passive Increases Complexity
If your RAID controller is active/passive, the complexity for systems that require HBA failover can increase greatly. With active/passive controllers, generally the RAID system is arranged in a controller pair where both controllers see both LUNs, but LUNs have a primary path for access to a LUN and a secondary path. If the LUN is accessed via the secondary path, the ownership of the LUN changes from the primary path to the secondary path.

This is not a problem if the controller has failed, but if the controller path has failed, either the HBA or switch and other hosts are accessing that LUN via its primary path. Now each time one of the other LUNs accesses the LUN on the primary path, the LUN moves from ownership on the secondary path to ownership on the primary path. Then when the LUN is again accessed on the secondary path, the LUN fails over again to the secondary path. This ping-pong effect will eventually cause the performance of the LUN to drop dramatically.

Host-Side Failover Options
On the host side, there are three options for HBA and switch failover, and in some cases, depending on the vendor, load balancing of I/O requests across the HBAs. Here they are in order of hierarchy in the operating system:

  1. Volume manager and/or File system failover
  2. A failover and/or load balancing driver failover
  3. HBA driver failover
Each of these has some advantages and disadvantages — what they are depends on your situation and the hardware and software you have in the configuration.

In the drawing below, we have an example of a mid-range RAID controller connected in an HA configuration with dual switches and HBAs, and with a dual-port RAID controller for both Active/Active and Active/Passive.

Active/Active RAID controller Example
Figure 1: An example of a active/active RAID controller.

With an Active/Active RAID controller configuration, the failover software knows the path to each of the LUNs and ensures that it will be able to get to the LUN through the appropriate path. With this Active/Active configuration, you could access any of the LUNs via any of the HBAs with no impact on the host or another host, and both controllers can equally access any LUN.

HA Active/Passive RAID controller Example
Figure 2: An example of a HA active/passive RAID controller.

If this were an Active/Passive RAID controller, it would be critical to access LUNs 0, 2 and 4 with primary controller A if a switch or HBA failed. You would only want to access LUNs 0, 2, and 4 from controller B if controller A failed. If a port on controller A failed, you would want to access the LUNs via the other switch and port and not via controller B. If you did access via controller B, and another host accessed the LUNs via controller A, the ownership of the LUNs would pong-ping and the performance would plummet.

Continued on Page 2: Volume Manager and File System Options

Submit a Comment


People are discussing this article with 0 comment(s)