Book Excerpt: Building SANs with Brocade Fabric Switches

Enterprise Storage Forum content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.


Building SANs with Brocade Fabric Switches


By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo


Solutions in this chapter:

  • Looking at the Overall Lifecycle of a SAN
  • Conducting Data Collection
  • Analyzing the Collected Data
  • Summary
  • Solutions Fast Track
  • Frequently Asked Questions

Introduction

We intend this book to allow you to effectively
design, implement, and maintain storage networks. Doing so requires an
understanding of the processes in each of the seven phases of a SAN’s
lifecycle, and their relationships with each other. Without taking a moment to
review the process from the highest level, it is easy to get lost in the
details of SAN hardware.

In this chapter, we provide that high-level view. We
show how the SAN design process is really an ongoing lifecycle. We take you
through the process from the moment the decision is made to deploy a SAN,
through releasing the SAN to production. Then we explain the extent to which
the process should be repeated when upgrades and architectural changes are
needed. We also provide detail on the first two parts of the lifecycle.

The processes presented here are derived from other
areas of Information Technology (IT) and they are normal parts of any
large-scale IT project. For example, when implementing a SAN, you should
interview people who will have a key interest in the finished productthe same
is true when putting in a Local Area Network (LAN) or Wide Ares Network (WAN).
Much of this material should be second nature to any IT network architect,
Database Administrator (DBA), or senior systems administrator. For the more
advanced users to whom these techniques are well understood in general, this
chapter will serve as reference material showing how these processes are
applied to SANs in particular. We have attempted in this book to provide
material that will allow both the beginner and the expert alike to successfully
design a SAN.

It is true that more attention must be paid to SAN
design than to most other networking technologies. This is because SANs
typically have more stringent availability and performance requirements than
other networks. A SAN is similar to a traditional network in its requirements,
but is also somewhat like a channel (for example, a CPU/RAM interconnect
mechanism, or a PCI bus). Channels require very high performance, and are
almost assumed to be 100 percent reliable. This is in stark contrast to the
traditional Ethernet LAN, where things like five-nines uptime for all node
connections, in-order packet delivery, and tuned approaches to bandwidth
management are rare indeed.

Fortunately, SANs provide the tools necessary to
achieve these performance and availability goals. For example, it is
commonplace in a Fibre Channel SAN to use a dual-fabric approach to SAN
architecture. This means having two completely separate networks for data to
travel over, and potentially using both networks as active paths. While it is
certainly possible to do this sort of thing using IP/Ethernet networks, it is
substantially more difficult, since Fibre Channel was designed with this in
mind, and Ethernet was not. The SAN designer must provide for higher
availability and spend some time thinking about performance, but will know
going into the process that these goals are entirely achievable.

We should also note here that the process outlined in
this chapter is designed to make a complex SAN design successful. With less
complex designs (that is, the majority of SAN deployments to date), it is
perfectly acceptable to skip over much of the process. For example, if you are
deploying a SAN with only three servers and two storage arrays, spending much
time on architectural analysis is unnecessary. The complexity is presented here
so that users with complex requirements will have it available to them; users
with simpler scenarios can use their judgment about which bits to incorporate
into their design process.

The seven phases of the lifecycle of a SAN at the
very highest level can be broken down into three broad categories: design,
implementation, and maintenance. The first of these, designing the SAN,
includes the collection and the analysis of data, which defines the
requirements of the network. We will go into detail on these first two phases
of the design process in this chapter. These phases provide a solid launch pad
for your journey through the remainder of the SAN’s lifecycle.

The third and fourth phases
of the SAN lifecyclearchitecture development and prototype testingcomplete
the design process. Implementing the SAN encompasses the transition phase and
the release to production phase, the fifth and sixth phases of the lifecycle.
These phases are discussed in Chapters 6 and 7 of this book. Chapters 8 and 9
cover the troubleshooting, maintenance, and managementthe final phases of the
lifecycle model.

When you are finished
reading this chapter, you should have a solid understanding of the design
processes, and have a valuable reference tool to enable project planning on any
future SAN deployments.

Looking at the Overall Lifecycle of a SAN


Any SAN will go through certain phases over the
course of its life. Depending on the size and complexity of the SAN, some
phases might take months to complete, and some might be only glanced over. For
example, a single-switch SAN does not require much in the way of network
design. However, if the solution involves hundreds of devices, including
storage components from many different vendors that were not already pretested
and determined to be interoperable, it could require extensive testing or
validation.

When an existing SAN must undergo a fundamental
change, be it at the architectural level or simply the introduction of a new
type of storage array, you should cycle back through the phases of SAN
development. This will ensure that the critical applications running on the SAN
are not unexpectedly disrupted by changes. However, when the change is
fundamental but small (like adding a new type of storage array) it is possible
to take a fast track through this process.

The SAN’s lifecycle, which
can be described at a high level as design, implementation, and maintenance,
translates directly into action-oriented phases on the part of the SAN
designer: data collection, data analysis, architecture development, prototype
and testing, transition, release to production, and maintenance. See Figure 5.1
for a flowchart of these phases and their relationships to each other.

Figure 5.1 An
Overview of the Lifecycle of a SAN

Data Collection

You must define the requirements of the SAN before
building it. What business problem is being solved by the SAN? What are the
overall goals of the project? To determine the requirements, you should
interview all affected parties, to find out what they all hope to achieve (in
other words, their goals and objectives), and develop both a detailed technical
requirements document and a timeline for the project.

Data Analysis

Once you have gathered input from all parties, you
must analyze it and put it into a meaningful format. The first two phases
together will allow you to start with the business goals that are driving the
project, and determine at a high level the necessary technical properties
required of the SAN. Once this phase is completed, all business requirements
should be translated into technical requirements. The technical requirements
document will be created during the collection phase, and completed during the analysis
phase. You will also have created a working document for a Return On Investment
(ROI) proposition to justify the expense of the project.

Architecture Development

Now that you have a list of technical requirements,
you will develop a SAN architecture that meets those requirements. This process
will involve balancing many factors. For example, there might be a tradeoff
between performance considerations and cost. It might be necessary for you to
cycle back to the data collection and analysis phases to gather more input to
make compromises with input from all affected parties. When finished, you will
have a detailed architecture of the SAN that you intend to build. A SAN
architecture includes the fabric topologies of all related fabrics, the storage
vendors involved, the SAN-enabled applications being used, and other
considerations that affect the overall SAN solution. This step is the most
likely to be skipped over quickly when the SAN requirements are small.

Prototype and Testing

SANs deal directly with the mission-critical data of
today’s enterprises. When building any mission-critical solution, you must test
it before releasing it to production. In this phase, you will build a prototype
of the SAN solution and test it to ensure that it will function properly when
released. This should be done using nonproduction systems. It might be
necessary to cycle back to the architecture development phase if problems are
found.

Wherever possible, build a test bed identical to the
solution you are implementing. This will provide the greatest assurance of
success in production. However, budgetary concerns, limits on time and space,
and other factors will usually prevent this from being practical. Imagine a
200-port SAN. Now imagine 200 hosts and storage arrays plugged into it. Now
imagine asking the CFO to buy another 200 devices to test with, and to provide
administrators, space, power, and cooling for all of it.

Because of this, the test phase will be a balance of
conducting your own testing, and leveraging other organizations’ test results.
Finding a document that says “vendor X already tested or certified this
configuration” might be as good or better than testing it yourself. Even if the
components of a solution have been tested by you and/or others to your satisfaction,
you must test all aspects of the complete system prior to releasing it to
production. This is due to the fundamental nature of a large networked system
where interactions, timing, and other factors can produce different results
from devices tested individually. The actual final test will occur during the
release to production phase, but creation of the test plan should occur in this
phase. At the end of this phase, all parties with an interest in the outcome of
the project will approve it, and the transition to production will begin.

Click here to buy book

Authors

Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo


Transition

Now that you have a working prototype, and all
interested parties have signed off on it, you will begin to transition your
existing hardware onto the new SAN. If a SAN is already in place, this phase
might be as simple as adding a new node to the SAN, or changing the
Inter-Switch Link (ISL) architecture. If the SAN is completely new, it might
involve a long migration process consisting of moving one production system at
a time. In any case, there might be a need to cycle between this phase and the
release-to-production phase repeatedly. Once a component has completed the
transition onto the SAN, release to production can occur for that component.

Release to Production

Once a component has been transitioned onto the new
SAN, it must be tested again and then approved before becoming a part of the
enterprise’s production environment. Since there might be many components that
must be transitioned and released, it might be necessary to cycle between the
transition and release-to-production phases repeatedly until all components
have entered production. After this phase is complete, the SAN will enter the
maintenance phase.

Maintenance

This is the useful life of the SAN. All of the
benefits that prompted the SAN designer to implement the SAN in the first place
are found in this phase. It is therefore desirable to have a SAN spend as much
time as possible in this phase, and as little as possible in the other phases.
The goal of this phase is to keep the SAN running as well as possible for as
much of the time as possible, and to expand its capabilities only according to
the original, tested, and approved parameters. This phase includes adding,
changing, or removing components, as well as managing, monitoring, and troubleshooting
existing components.

During the maintenance phase, no changes should be
made to the SAN that fall outside of the original blueprint that was
established in the previous phases. Any such change necessitates a repetition
of the entire lifecycle. For example, if the SAN were originally built using
vendor X storage arrays, an additional vendor X array could be added as part of
maintenance, but an array from vendor Y would require thought and testing
before its introduction. It might not require much thought and testing, but it
must, in any case, be looked into.

Note: Any fundamental change to the SAN requires a
repetition of the entire lifecycle.

In summary, the seven phases of the SAN design
lifecycle are:

1. Data Collection
2. Data Analysis
3. Architecture Development
4. Prototype and Test
5. Transition
6. Release to Production
7.Maintenance

Conducting Data Collection

The data collection phase of SAN design is the
foundation upon which the SAN will be built. It is vital that the information
collected in this phase be both complete and accurate. If the SAN requirements
are poorly defined, it is guaranteed that the resulting SAN will meet business
objectives poorly. You should therefore take your time with this phase.

Some of the information you will collect is generic
to any major IT project. If you already have an established data collection
process in your company, integrate the SAN-specific material from this section
into that process.

Data collection consists of determining which people
you will need to interview, interviewing them, and conducting a physical
assessment of existing equipment and facilities. When this process is complete,
you will have a technical requirements document consisting of a list of the
business problems that the SAN will solve, the business requirements for the
SAN, characteristics of all devices that will be attached to it, and detailed
information about all relevant facilities. You will also have a timeline for
implementation.

Creating an Interview Plan

Who has a stake in the SAN solution? Well, you could
argue that every person who uses a system attached to the SAN has a stake in
it. While true, this is not useful for creating an interview list, because
there would be too many people involved. Similarly, you could argue that only
the person who initiated and “owns” the project should be consulted. Again,
this is not useful, because it leaves out people who have a strong interest in
the project, and might have knowledge that is critical to its success.

A balanced approach to creating an interview list is
critical. You can view the people on this list as a SAN solution “core team.”
Think about having all of these people together in a room, and trying to solve
the SAN solution problem together. Try to include everyone needed to solve the
problem, but nobody else. Typically, a core team might include:

  • At least one systems
    administrator
  • At least one storage
    administrator
  • A network administrator
  • A DBA, if a database
    server will be involved
  • At least one application
    specialist associated with each application that will run on the SAN
  • At least one manager
    who can act as an overall “owner” of the project
  • It is probable that you will be one of these people,
    in addition to being the SAN designer. Unless you are an external consultant,
    this is typically the case.

    Once you have a list of the desired members of the
    core team, you must contact them and ask them to take time to help with the
    project. Ensure that each team member has allocated the necessary time and that
    their management appreciates the demands of participating in this team. As the
    SAN design goal of the team might require a long-term process, getting this
    buy-in initially will minimize disruption to the team later. Often in the past,
    SAN design teams did not include network administrators, as the focus was on
    the storage side. Experience has shown that SANs are networks, and should be
    coordinated with the traditional IP network groups to ensure that proper
    networking experience is at hand.

    Whenever possible, schedule an interview as a
    face-to-face, one-on-one meeting. This format will allow you to communicate the
    questions and understand the answers most effectively. You should also have a
    group meeting with the entire core team after conducting individual interviews.
    This will allow you to resolve any differences before analyzing the data, and
    review the analysis as a team.

    Conducting the Interviews

    Now that you know who to interview and have a
    schedule of when you will interview them, you need to know what questions to
    ask, and what format to put the collected data into. This section contains a
    suggested set of questions that you should ask, and some detail on what each
    question is about. It is followed with a summary that could be used to create
    an interview form.

    Note: Not every person you interview will be able to answer
    every question. Between the members of the core team, the expertise necessary
    to answer all of these questions should be completely represented. Some members
    might provide conflicting answers. You will be in a key position to resolve
    these differences, and achieve a compromise. It is vital that all affected
    parties agree with the deployment strategy before implementation begins.

    What Overall Business Problem Are You Trying to
    Solve?

    A business problem that would initiate a SAN design
    might be something like:

  • “We need to keep our
    business running in case of a disaster like an earthquake or fire.”
  • “Our backups take so long
    to finish that they are impacting our ability to process customer orders.”
  • “We need to save money
    on storage by utilizing free space more efficiently.”
  • Chapter 6 discusses some of the more common business
    problems that SANs can solve. Brocade maintains a series of documents that
    detail specific SAN solutions. These documents are known as Brocade
    SOLUTIONware configuration guidelines and are available on the Brocade Web site
    at www.brocade.com/SAN.

    Note: A SAN might be intended to solve multiple business
    problems. In this case, you should separate each business problem into a
    different set of questions and answers. You will correlate these during the
    analysis phase.

    What Are the Business Requirements of the Solution?

    Once you know the business problem that you need to
    solve, it should be easy to figure out what the business requirements of the
    solution must be. This is simply a matter of rephrasing the previous answers,
    with more specific criteria:

  • “The SAN must allow all
    functionality of all business-critical servers at site X to resume within Y
    minutes at site Z.”
  • “The SAN must allow the
    following list of servers to complete backups within X minutes: “
  • “The SAN must allow the
    following list of servers access to the corresponding list of storage arrays: “
  • This is useful because it acts as a migratory step
    toward turning the business problem into a matching technical solution.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors

    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo


    Moving from Business Requirements to Technical
    Requirements

    You should not deploy a SAN simply for the sake of
    adopting the “hot new technology.” SANs are hot because they solve important
    business problems and allow companies to make more money. This could be fairly
    directfor example, a matter of saving more money on IT than the project cost,
    since SANs are very efficient at providing a clear ROI. ROI is often achieved
    by management efficiencies, resource efficiencies, or better utilization of
    resources. On the other hand, it could be indirectby making IT systems more
    efficient, thus increasing users’ productivity.

    The first key to a successful SAN deployment is the
    accurate and complete statement of what business problem(s) you intend for the
    SAN to solve. Unfortunately, you cannot turn a business problem into a
    technical solution without work. There is no silver bullet to make your backups
    run faster so that your users will not have to work on a slow system. However,
    there are tape libraries that run fast, and can be shared by many devices.
    This, when combined with an appropriate Fibre Channel fabric, and a SAN-enabled
    backup application, could amount to the same thing as the silver bullet.

    In order to know which hardware and software will
    solve your business problem, you have to define in a technical way what you
    need to accomplish. This is a necessary intermediate step between the business
    problem and the purchase of specific technical solutions.

    It is fairly straightforward to change a sentence
    like, “We need to keep our business running in case of a disaster like an
    earthquake or fire” into a sentence like, “The SAN must allow all functionality
    of all business-critical servers at site X to resume within Y minutes at site
    Z.” Once you have done this, you will have the business requirements of the
    solution. You know that you have a business requirements statement when you
    could phrase it like this, and still have it make sense: “Our business will run
    better if we have a SAN that can allow all functionality of all
    business-critical servers at site X to resume within Y minutes at site Z.” The
    components of the business requirements statement are “our business will run
    better” (or something to that effect) followed by a reasonably specific
    statement about what the SAN must do to make that happen.

    However, you will still not have the technical
    requirements detailed. This is not something that you, the SAN designer, can
    simply ask in an interview. This is a large part of what you will bring to the
    table as the SAN designer once you have gathered
    the data and then analyzed it in the next phase. A technical requirements
    document set should list, in detail:

  • All of the devices that
    are to be attached to the SAN
  • Their locations
  • The communication
    patterns between them (random I/O, streaming access such as video,
    I/O-intensive database access)
  • Their performance characteristics
    (reads, writes, max/min/typical throughputs)
  • What software will run
    on them relative to the SAN (for example, a LAN-free backup application, or
    anything SAN-specific)
  • How all of this is
    expected to change over time (storage growth, server growth)
  • The technical requirement statement would be, “The
    SAN needed to meet the business requirements outlined must have the following
    characteristics: ” This would be followed by the body of the technical
    requirements document. The rest of the questions to ask in the interview
    process will provide you with the body of this document.

    What Is Known about the Nodes that Will Attach to the
    SAN?

    You should try to get a list of all information
    possible about every node attached to the SAN. For each node, the relevant
    information can include questions about each host, storage device, facilities
    where hosts and storage will be located, and questions about the SAN itself.
    Questions about each host could include the following:

  • What operating system
    is installed? What patch or service pack level?
  • Are fabric
    HBA/controller drivers available? Are they well tested?
  • What type of connection
    is supported (private loop, public loop, or fabric)?
  • Which applications will
    run on this host (databases, e-mail, data replication, file sharing)?
  • How much storage does
    it require?
  • How will its storage
    requirements change over time?
  • Physically, what are
    its dimensions? How heavy is it?
  • Does it rack mount?
    Does it have a rack kit? Will it set on a shelf?
  • If there is a
    management console, what type is it? (Is it a traditional keyboard/video/mouse
    combo [KVM], or is it a serial connection, like a TTY?) Does it need to be
    permanently attached? (For example, a Sun SPARC server could have a keyboard,
    mouse, and monitor permanently attached, or it could be managed through a
    serial port attached to a modem.)
  • How many HBAs will it
    have?
  • If it has more than one
    HBA, what software will be used to provide failover or performance enhancements
    of multiple paths?
  • Do these interfaces
    exist, or do they need to be purchased? (You should keep track of every piece
    of equipment that you need to buy for the project, for budgeting and ROI
    analysis.)
  • If they exist, what are
    the make, model, and version information?
  • If not, what kind will
    be purchased to meet the objective?
  • How many Ethernet
    interfaces will it have?
  • In what temperature
    range will it operate?
  • Will it need a
    telephone line for management?
  • Where will the node be
    physically located?
  • These questions could be used to create an interview
    form for each host, which might look like the following:

    Questions about each storage device could include the
    following:

  • What are the make,
    model, and version information?
  • What type of connection
    is supported (private loop, public loop, fabric, SCSI, SSA)?
  • How many hosts can this
    device serve?
  • If it is a multiport
    device, does it have limits on how many hosts can access it through each port?
  • Physically, what are
    its dimensions? How heavy is it?
  • What is its capacity in
    gigabytes?
  • Does it rack mount?
    Does it have a rack kit? Will it sit on a shelf?
  • If there is a
    management console, what type is it? Does it need to be permanently attached?
  • How many Fibre Channel
    interfaces will it have?
  • Do these interfaces
    exist, or do they need to be purchased?
  • If they exist, what are
    the make and model? If not, what kind will be purchased?
  • How many Ethernet
    interfaces will it have?
  • In what temperature
    range will it operate?
  • Note: Obviously, some of these questions do not relate
    directly to the SAN deployment. However, they are generally relevant whenever
    making a large architectural change in a data center. For example, it is
    necessary to know what temperature a server can operate at in case the server
    is in a location where temperature control is an issue. In this case, adding a
    large number of switches might increase the room temperature beyond operating
    levels. As always, use your judgement about which questions to include in your
    interview, and which to skip over.

  • Will it need a
    telephone line for management?
  • Where will the node be
    physically located?
  • What is the firmware
    level?
  • For tape libraries,
    what is the capacity of each cartridge, number of cartridges the library can
    hold, number and speed of drives, and number of transports?
  • SCSI or Fibre Channel
    interface? What type of SCSI (wide/narrow, differential/single ended)?
  • Note: While it is possible to manage an entire fabric
    through a single Ethernet connection, this is not the method that Brocade
    currently recommends. You should plan on one Ethernet connection per Brocade
    switch, in addition to planning connections for hosts and other SAN devices. It
    is also advisable for the highest availability plan to balance switches across
    multiple electrical circuits, even if an Uninterruptible Power Supply (UPS)
    protects them.

    Questions about facilities where hosts and storage
    will be located could include the following:

  • Who is responsible for
    this facility?
  • Are there any existing
    optical cables, and what type?
  • Is there sufficient
    electrical power?
  • What about cooling?
  • Is there enough rack
    space?
  • What is the network
    infrastructure?
  • Physical access? If the
    location is on an upper floor, is there a freight elevator?
  • Answers to questions about the SAN itself must be
    considered preliminary. They will indicate preconceptions that members of the
    core team have, but all members should be prepared to be flexible on these
    preconceptions as the SAN design process progresses. Questions about the SAN
    itself could include the following:

  • Are there any distance considerations?
    (For example, long cable runs between floors of a building, campuswide
    networks, or MAN/WAN connections.)
  • How many hosts will
    attach to the SAN?
  • How many storage
    devices will attach to the SAN?
  • If known at this point,
    do they require any-to-any connectivity? Alternately, are there groups of
    devices that need to communicate only among themselves?
  • Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors

    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo


    Moving from Business Requirements to Technical
    Requirements

    Which SAN-Enabled Applications Do You Have in Mind?

    Will the SAN use a serverless backup application? How
    about clustering software? How about volume management? This category of
    software requires special attention because of its close ties to the SAN
    hardware you choose to build the solution. For example, if you plan to use
    vendor X serverless backup software, you must make sure that your backup
    hardware (tape libraries, Fibre Channel/SCSI gateways, etc.) is supported.

    Which Components of the Solution Already Exist?

    Any hardware or software that is already in place and
    that must be included in the solution will create points for you to build
    around. You must find out as many details as possible about everything in this
    category. When you are finished with the interviews, and conduct the physical
    assessment, you should personally inspect every piece of hardware. This will
    prevent surprises later in the process. Make sure that you find out exactly
    where all hardware is located, and how to access it.

    You must pay special attention to devices that
    already exist and already have Fibre Channel interfaces. Find out which kinds
    of HBAs are installed in hosts, and which driver revisions are installed on
    them. Find out code levels for RAID arrays and Fibre Channel tape libraries.
    Find out if upgrades to driver/code levels are planned or at least allowed.

    Note: You must know if each device is public loop, private
    loop, or full fabric. Some devices might even be SCSI and require additional
    hardware to bridge between SCSI and Fibre Channel.

    If possible, you should not use private loop drivers
    on initiators unless the device does not support fabric drivers or is not easy
    to upgrade. Private loop hosts require special licenses, typically Brocade
    QuickLoop and Zoning. Find out if the existing devices are configured as
    full-fabric devices. If not, find out if their drivers support full fabric, or
    if they can be upgraded to full fabric. This is not intended to discourage
    incorporation of private loop devices into a fabric: QuickLoop and Fabric
    Assist exist specifically to enable this to occur. However, if a device can
    support full fabric, then integration into the SAN will be easier if it does
    so.

    Which Components Are Already in Production?

    Components that are in production require special
    attention in two areas:

  • Duplicate equipment
    might be desired for testing.
  • The transition phase is
    more complex.
  • It is vital to know as much as possible about
    production systems that are going to transition onto the SAN. Therefore,
    somebody intimately familiar with and responsible for every such system should
    be included on the core team.

    Which Elements of the Solution Need to Be Prototyped
    and Tested?

    For relatively simple solutions that involve only
    components already certified to work together, it might be that you do not have
    to do any testing at all. For example, if you are implementing a SAN-based
    solution on a Brocade SOLUTIONware document, you might feel that you need only
    to do minimal validation. This is opposed to a solution where no documentation
    or testing information exists, which generally requires extensive validation.

    For more complex solutions involving a large number
    of devices that might be from many different vendors, you might feel that every
    single element needs to be tested in combination before release to production
    can occur. You should get input on this from every member of the core team. If
    any team member feels that you should conduct inhouse testing on a component,
    you should strongly consider doing so.

    What Equipment Will Be Available for Testing?

    Any existing equipment that is not in production, and
    any equipment that is going to be purchased specifically for this project might
    be good material with which to build a test bed. Existing equipment that is in
    production is not good to test with. If existing equipment already in
    production will be transitioned onto the SAN, it might be beneficial to budget
    for a representative sample of duplicate, nonproduction systems with which to
    prototype the solution. It is generally a good idea to have such systems
    available for testing in any case. It may also be possible to borrow systems to
    test with. In any case, it’s probably worth asking your vendors for such loans.

    Whether or not test equipment is available, you
    should research what testing third-party vendors or third-party organizations
    have already done. In this way, you will avoid duplicating their efforts. If
    you cannot get representative test equipment for an element that needs to be
    prototyped, it might be acceptableand necessaryto rely entirely upon the work
    done by others to validate the solution.

    Again, with many solutions, this is a perfectly
    acceptable way to go. If you do not feel that inhouse testing is warranted,
    then you can save time and money by skipping the prototype and test phase. Just
    make sure that you have documentation certifying the solution before you make
    this decision.

    How and When Are Backups to Be Done?

    You need to get a list of everything that relates to
    the system’s backups:

  • What backup hardware
    will be used?
  • What backup software
    will be used for each host?
  • Which storage arrays will
    be backed up by which tape libraries?
  • When will these backups
    occur?
  • How long can they take?
  • How much data needs to
    be backed up?
  • Will snapshots be used?
    How do they work?
  • Will split mirrors be
    used? How do they work?
  • What Will Be the Traffic Patterns in the Solution?

    You should produce a matrix showing every
    initiator-to-target communication expected in the SAN. This is necessary to
    determine performance characteristics, and to set up zoning on the fabric:

  • Which hosts will use a
    specific storage array?
  • Which hosts in a
    cluster will talk directly to each other over the SAN?
  • Which backup devices
    will be performing serverless backups?
  • Which arrays will they
    be backing up?
  • Create a table listing every device on the SAN that
    can act as an initiator in one column. This will include every host, every
    storage virtualization product, and every serverless backup server. It might
    include storage arrays, if they have data replication capabilities. Then put a
    second column next to it with all of the targets that each initiator will
    communicate with (Table 5.1).

    Table 5.1
    Initiator-to-Target Mapping

    SAN Traffic
    Patterns

     

    Initiators

    Targets

    host1

    array3

    host2

    array1

    array2

    tape1

    host3

    array1

    host4

    array1

    array2

    tape1

    array1

    array3

    array3

    array4

    array3

    array4

    Note: that some devices on a SAN can act as both an
    initiator and a target. If so, they will appear in both columns. See array3 and
    array4 in Table 5.1. This is how you would indicate that array3 and array4
    perform data replication between them.

    You will not necessarily be able to build this table
    by interviewing one person; it will likely be developed over the course of the
    interview process, changed as the implementation takes place, and maintained
    for the life of the SAN.

    What Do We Know about Current Performance
    Characteristics?

    Any devices that currently exist, and will be
    transitioned onto the SAN, are candidates for empirical performance testing.

    Create a second set of columns next to the traffic
    pattern columns, as shown in Table 5.2. You will need entries for peak
    utilization and sustained utilization. Obviously, you will only be able to
    enter current data for initiators that already exist, and already communicate
    with the same targets they will talk to after the SAN is complete.

    Table 5.2
    Current Traffic

    SAN Traffic Patterns

    Current Peak

    Current Sustain

    Initiators

    Targets

    MB/sec

    MB/sec

    host1

    array3

    10

    5

    host2

    array1

    array2

    tape1

     

     

    host3

    array1

    50

    10

    host4

    array1

    array2

     

     

    tape1

    array1

    array3

     

     

    array3

    array4

     

     

    array4

    array3

     

     

    In this example, host1 and host3 already exist, and
    are already talking to array3 and array1, respectively. All of the other
    devices are to be added, are not talking to the same targets that they will be
    after the SAN is up, or performance data might simply be unavailable.

    If the owner of a system has already done this kind
    of analysis, you will simply transfer the numbers to your table. If not, you
    should work with the owner to get the performance information, as this might
    have a substantial impact on your SAN design.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors

    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo


    Gathering Performance Data

    On almost any kind of system, some facility exists for
    measuring performance. More often than not, there will be multiple options for
    gathering disk I/O performance information.

    For example, on a Windows NT system, you might use
    the diskmon feature. You have to install this from the Windows NT Resource Kit.
    If you do not install diskmon, standard Windows perfmon will not have a disk
    monitoring tool. Alternately, you could install a package like Intel’s Iometer,
    and use that to generate a simulated load and measure performance. This tool is
    presently available as a free download from Intel’s Web site.

    Under Sun’s Solaris operating system, performance can
    be measured using the iostat utility, the GUI utility perfmeter, or one of a
    number of third-party utilities like Extreme SCSI. There are similar tools in
    every UNIX variant. We are providing examples for Solaris only, since the
    details of these commands will vary between every flavor of UNIX, and providing
    examples for every variant is impractical. Refer to the man pages for your particular
    version of UNIX for the exact syntax. There are also a number of options for
    generating loads under Solaris, ranging from the dd command, toagaina utility
    like Extreme SCSI.

    Note: Tools like Iometer, dd, and Extreme SCSI should be
    used with care. It is tempting to use them to generate maximum load. A more
    useful test to run is to generate a representative load. Try to determine what
    your application will actually be doing in terms of read/write ratio, and total
    bandwidth consumption, and use these tools to generate that kind of load on the
    system.

    In cases where performance data cannot be collected
    empiricallysuch as when the system in question does not exist yetthere is
    still hope. Most hosts are not capable of generating sustained load at full
    wire speed. They are generally going to be limited by other factors. These
    could include:

  • CPU speed Although Fibre Channel has much lower
    overhead than the TCP/IP stack, it still takes a fast processor to get near to
    full performance on a 1 Gbit/sec Fibre Channel link, simply because the
    processor will be busy running whatever task is actually generating the
    I/O. While almost all hosts now shipping have sufficiently fast CPUs, you
    also need to estimate how much of that CPU resource is taken up by other tasks
    the host is performing that do not result in disk I/O (such as running a TCP/IP
    stack). Moreover, many data centers have older CPU servers that might not be
    capable of running at 1 Gbit/sec even without taking these tasks into
    consideration.
  • PCI bus speed Fibre Channel full duplex is 200 MB/sec. A
    32-bit 33 MHz PCI bus can only sustain about 120 MB/sec. A 64-bit 33 MHz or
    32-bit 66 MHz PCI bus can handle about 240 MB/sec, and a 64-bit 66 MHz bus can
    handle about 480 MB/sec. Even on the higher rate buses, you must bear in mind
    that it is a shared bus. If you put two Fibre Channel HBAs onto a bus that can
    handle 240 MB/sec, that will be the total possible full-duplex speed for both
    HBAs. Therefore, you would on average get 120 MB/sec out of each interface. For
    example, this couldin a balanced read/write environmentmean that you get only
    60 MB/sec of read performance out of each card. Also bear in mind that there
    may be other cards on the bus taking up some of that bandwidth.
  • HBA speed Although designed to work on a 1 Gbit/sec
    SAN, many HBAs cannot achieve or at least cannot sustain full 1 Gbit/sec
    transfers. Newer HBAs typically have better performance. Older HBAs might only
    be able to achieve 60 MB/sec, regardless of the other possible issues.
  • RAID controller speed Many RAID controllers cannot
    sustain 100 MB/sec per interface on all interfaces simultaneously. Some barely
    operate at 30 MB/sec per interface, which is more than acceptable for many
    applications! Finding out the limits of your RAID array should be as simple as
    calling the vendor’s support channel. Of course, you might also check
    third-party testing results such as those done by many industry magazines for
    an unbiased opinion.
  • RAM quantity and speed If your system is short on RAM,
    it might spend a lot of time paging. If it does, performance will be
    substantially degraded.
  • Disk seek time If your application does a lot of random
    I/O, the disk heads will have to jump all over the platform. Since disk seek
    time is an order of magnitude or more slower than a Fibre Channel link, you
    might have to allocate substantially less bandwidth for random I/O applications
    like a file server than for sequential I/O applications like a video server or
    decision support system.
  • Application
    overhead
    This ties into the CPU-limit
    factor. How much CPU do you have, and how much of it is free for handling I/O?
  • Write speed of tape
    device
    Most tape drives cannot come
    anywhere near 100 MB/sec. It is usually sufficient to ask a vendor for
    performance data in the case of tape drives, although optimistic compression
    ratios can inflate the performance numbers they provide.
  • In addition, if anything is known about the
    application that is running on the host, you might be able to make a good guess
    about how much load it will even try to place on the disk subsystem. For
    example, if you know that the host is an intranet Web server, and that it
    receives only 500 hits a day, you can safely guess that its I/O requirements
    will be minimal.

    Once you have collected your best empirical or estimated
    numbers for each factor, use the lowest common denominator approach to estimate
    the maximum bandwidth that the system could need. You can guarantee that the
    overall system will not outperform its weakest link.

    Also note that on systems with multiple HBAs, I/O
    load might be distributed across these HBAs. Achieving active-active
    distribution across HBAs might require third-party applications like the
    VERITAS Dynamic Multipathing software, Troika’s HBA driver, or one of the
    storage vendor’s dual-path products. If this is the case, you might estimate
    that each HBA will usually have a fraction of the total load. In a dual-fabric,
    active/active HBA architecture, each HBA normally has 50 percent of the total
    load. If a system is capable of sustaining 70 MB/sec, then each HBA will
    sustain 35 MB/sec. Note that this might change during system maintenance if you
    shut down one path, and the remaining path could then take on the full 70
    MB/sec, so the design should incorporate the worst-case scenario. It is usually
    also good practice to add some padding to the top of this estimate (perhaps 10
    percent) to allow for the unexpected.

    Note: Unlike physical-disk counter data, logical-disk
    counter data is not collected by the NT operating system by default. To obtain
    performance counter data for logical drives or storage volumes, you must type
    diskperf -yv at the command prompt. This will cause the disk performance
    statistics driver used for collecting disk performance data to report data for
    logical drives or storage volumes. By default, the NT operating system uses the
    diskperf -yd command to obtain only physical drive data. For more information
    about using the diskperf command, type diskperf -? at the command prompt.

    What Do We Know about Future Performance
    Characteristics?

    Performance numbers change over time. Consider a
    customer database for a catalog retail company. Perhaps you will install the
    SAN in February, because this is your slow month of the year, and you can get
    the necessary downtime. You might know that the database host will start
    talking to its storage array(s) at a sustained rate of 5 MB/sec during the
    business day, with a peak of only 10 MB/sec. However, when the Christmas season
    comes along and your business picks up, you might move to a 50 MB/sec sustained
    rate, peaking at 70 MB/sec. Because of the potential for substantial changes in
    performance requirements over time, it is essential to plan for both current
    and projected performance. Most of this might be educated guesswork, since many
    of the systems you are going to deploy might not yet exist.

    Again, you will need to come up with numbers for both
    sustained traffic and peak traffic for each communication. Also try to
    determine what days/times peak performance will occur. This will be added to
    your table (Table 5.3).

    Table 5.3
    Adding Traffic Projections

    SAN Traffic Performance

    SAN Peak Peak Times

    SAN Sustained Patterns

    Performance

    Initiators

    Targets

    Initial

    Expected

    Initial

    Expected

    Initial

    Expected

    host1

    array3

    10

    10

    5

    5

    M.F

    same

     

     

     

     

     

     

    8a-5p

     

    host2

    array1

    array2

    tape1

    0

    0

    20

    70

    70

    20

    0

    0

    0

    50

    50

    0

     

     

    host3

    array1

    50

    50

    10

    20

    M.F

    8a-5p

    + Sa

    10a-4p

    host4

    array1

    array2

    0

    0

    90

    90

    0

    0

    50

    50

     

     

    tape1

    array1

    array3

    0

    0

    20

    20

    0

    0

    0

    0

    Sa

    5p-9p

    Sa 9p-11p

    Same

    same

    array3

    array4

    10

    30

    5

    5

     

     

    array4

    array3

    5

    5

    0

    0

     

     

    Again, you can only enter data for systems about
    which you can make an educated guess. If you know about what the peak traffic
    could be based only on the limitations of a system, you might not have any way
    of guessing when this would occur. You should also enter projected data for systems
    that you know that you will add later.

    In Table 5.3, host2 and the application it is running
    might not exist yet, so every piece of data about that system is pure
    guesswork. Let us say that host2 is a Return Merchandise Authorization (RMA)
    system, and your rapidly growing company has never had an RMA system before.
    You might not be able to reliably guess when customers are going to call in
    with RMA requests most often, or even how many RMAs you are going to get in a
    given day. The best you can do is determine what performance the hardware and
    software you are installing could reasonably run at, and design the SAN to
    support it all the time it could be in use. While this approach might result in
    over-engineering your network, this is better than the alternative. During
    future design phases, you can alter the SAN design to adjust or scale back the
    design accordingly, as well as incorporate other additions and changes.

    For backup devices, peak usage will always correspond
    with your backup schedule. This will usually not correspond with peak usage of
    the rest of the system. This is particularly useful knowledge when planning an
    ISL architecture, because you can often count on having low nonbackup-related
    utilization of ISLs during backup windows. An obvious exception to this is a
    SAN that is used solely for performing LAN-free backups.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors

    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo


    How Much Downtime Is Acceptable to Production
    Components During Implementation?

    It will likely be necessary to shut down some
    existing production devices during implementation, to ensure a safe transition
    onto the SAN. For example, you might have to shut down a host to install an
    HBA. Determine how much downtime is acceptable for each host, and at what times
    this can occur. Generally, you should try to schedule more downtime than you
    think you need to ensure that any unforeseen issues that arise during the
    implementation can be handled within the downtime window.

    How Much Downtime Is Acceptable for Routine
    Maintenance? How Much Downtime Is Acceptable for Upgrades and Architectural
    Changes?

    These two questions are intimately related,
    becauseto an end userthere is really no difference between downtime to a
    production system for maintenance, and downtime for an upgrade. Once systems
    are in production, you will want to keep them running as much as possible.

    Many upgrades can be accomplished with zero downtime
    by using a double- or triple-redundant fabric architecture. No matter how well
    you plan the upgrade and maintenance processes beforehand, you will need to
    shut down specific hosts on occasion. For example, you might want to upgrade an
    HBA driver, which would typically require a reboot.

    Note: Wherever possible, a redundant fabric architecture
    should be used. This will ensure the best performance and reliability, and will
    simplify maintenance tasks. In a redundant fabric architecture, every host has
    at least two paths to every storage device it connects to, and these paths
    traverse two completely unconnected fabrics. While it might appear on the surface
    to be more expensive, if hosts are to be dual-attached anyway, it is actually
    less expensive to attach them to two separate fabrics than to use one larger
    fabric, or a director-class switch. This does not even include the downtime ROI
    calculation, which, in high-availability environments, will usually overshadow
    the entire cost of the SAN. More details about redundant and resilient fabrics
    are provided in Chapter 7.

    You should therefore determine in advance when you
    will be able to schedule downtime for every host and storage array, and for the
    fabric itself. You might not have to use every scheduled outage, but having
    them available to you when you do need them is essential.

    One way to do this is to make a list of applications
    and services provided by the hosts on the SAN, and determine an owner for each.
    Take your list of SAN devices and map these devices to the applications and
    services they affect. This will provide a mapping of application/service
    owners, who are typically responsible for scheduling downtime, to devices that
    typically require downtime. Have each owner approve the downtime calendar for
    each device that affects his or her service.

    The mapping of owners to devices should be kept up to
    date as changes in personnel, applications, and/or SAN infrastructure occur.

    When Do You Need Each Piece of the Solution to Be
    Complete?

    Once you have a table detailing which of the
    initiators communicate with which targets, you can begin to create a timeline
    for the project. Other members of the core team will tell you something like,
    “the customer database application must be online by mid-June.” It is your task
    to define which SAN components you need to accomplish this, and to develop a
    timeline for adding these components that meet their requirements.

    This is a high-level list of some of the questions
    that should appear on a SAN design interview form:

  • What overall business
    problem are you trying to solve?
  • >What are the business
    requirements of the solution?
  • What is known about the
    nodes that will attach to the SAN?
  • Which SAN-enabled
    application do you have in mind?
  • Which components of the
    solution already exist?
  • Which components are
    already in production?
  • Which elements of the
    solution need to be prototyped and tested?
  • What equipment will be
    available for testing?
  • How and when are
    backups to be done?
  • What will the traffic
    patterns in the solution be?
  • What do we know about
    current performance characteristics?
  • What do we know about
    future performance characteristics?
  • How much downtime is
    acceptable to production components during implementation?
  • How much downtime is
    acceptable for routine maintenance?
  • How much downtime is
    acceptable for upgrades and architectural changes?
  • >When do you need each
    piece of the solution to be complete?
  • Conduct a Physical Assessment

    You should now have the location of every piece of
    hardware that currently exists. In addition, you should know where each piece
    of hardware in the eventual SAN will be located.

    Look at each piece of hardware. Make sure that it
    does exist, and has all necessary pieces to function. This could include things
    like power cords, keyboard, mouse, monitor, Ethernet card, Ethernet cable,
    HBAs, and Fibre Channel cables. Note the physical dimensions of the hardware,
    and its power/cooling requirements. Does it rack mount? Does it have a network
    interface? How many Fibre Channel interfaces does it have? How much does it
    weigh? You should already have this information from the interview process, but
    you should verify that the information you were given is correct.

    Go to each location where SAN equipment or nodes will
    be installed, and again check to see that your information was correct. Notice
    how the equipment will fit into the space available. Notice how the equipment
    will enter the building. You should also have a meeting with the person in
    charge of the facility to discuss power, cooling, and equipment locations.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors

    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo


    Analyzing the Collected Data

    Now that you have collected information from all key
    stakeholders in the project, and verified the accuracy of this information, you
    will analyze it to determine the characteristics of the required solution. When
    you have completed this process, you will have a list of technical
    requirements, and an ROI analysis to justify the project.

    Processing What You Have Collected

    You have a matrix detailing communication between
    nodes. Attempt to group the nodes by communication patterns. The purpose of
    this is to determine the amount of known locality in the SAN. Locality of
    reference is a concept prevalent in many areas of computer science, from disk
    drive construction to LAN design. Locality is important in SAN design because
    if you can localize traffic into specific areas of a SAN, you directly improve
    the SAN’s performance and reliability. This will allow a more cost-effective
    SAN design as well, preventing over-designing the network to handle nonexistent
    cross traffic. Locality is discussed in greater detail in Chapter 7.

    A SAN with a great deal of known locality might be
    constructed out of many separate fabrics, with no ISLs whatsoever. A SAN with
    little or no known locality might require a high-performance ISL architecture
    (Table 5.4).

    Table 5.4
    Initiatorto-Target Mapping for Locality Example

    SAN Traffic Patterns

    Initiators

    Targets

    host1

    array3

    host2

    array1

    array2

    tape1

    host3

    array1

    host4

    array1

    array2

    tape1

    array1

    array3

    array3

    array4

    array4

    array3

    In Table 5.4, array3 would be grouped with host1,
    tape1, and array4. None of those devices will need to communicate with any of
    the other devices. They could be grouped onto a single switch, or even put onto
    a totally separate fabric. You might find it helpful to do the grouping in a diagram.
    For another example, look at Figure 5.2.

    Figure 5.2 SAN
    Diagram without Grouping

    Nothing is known about the communication patterns in
    this SAN. Consequently, there is no way to optimize ISLs for performance. After
    grouping the initiators with their targets, the SAN diagram could look
    something like Figure 5.3. If you look carefully, you will notice that there
    are only 12 connections into this SAN. If there are fewer connections than
    there are ports in your switches, you do not really need to go through the
    grouping exercise because localization of traffic will happen automatically. It
    is only useful if you will be using ISLs; however, as most systems scale well
    past the size of the largest switches available, it will be a frequent
    exercise. For the purposes of making the examples more readable, we will just
    assume that they are all dealing with a subset of the devices that the SAN will
    support.

    Figure 5.3 SAN
    Diagram with Simple Grouping

    Making a diagram such as this will allow you to see
    at a glance what the communication patterns for your SAN are.

    This example is simplistic, and in large SANs, there
    will likely be conflicts. When you cannot effectively group all of the
    communication patterns, you should focus on grouping faster performing devices.
    For example, if you find that the bulk of traffic will be between host1,
    array3, and array4, these could be grouped separately from tape1 and host2 if
    necessary. This could happen if you find that there are so many
    interrelationships that you end up with very many devices, but very few very
    large groups. The grouping technique does not help for performance if you only
    have one big group. It could also happen if you have a few devices that are
    shared by a great many devices, such as a large RAID array in a storage
    consolidation solution.

    Another way to combat this “group growth” problem is
    to account for multiple interfaces on storage arrays. Let us say that you have
    a redundant fabric architecture. Your RAID array has eight interfaces, and each
    host will access only two of themone interface on each fabric. List each
    interface on the array separately in your traffic pattern table. Then, you
    associate servers or groups of servers with specific interfaces. With the array
    listed as a single entity, a diagram of the communication could look something
    like Figure 5.4.

    Figure 5.4 SAN
    Grouping Diagram with Single-Entity Arrays

    If, however, you separate the interfaces, your
    diagram could look more like Figure 5.5.

    Figure 5.5 SAN
    Grouping Diagram with Separated Interfaces

    You can indicate that a device crosses groups but
    does not need much in the way of performance by varying the line color, weight,
    or pattern. Figure 5.6 shows that the tape robot crosses all groups, but does not
    need much bandwidth.

    Figure 5.6 SAN
    Grouping Diagram with Tape Robot Addition

    If you are able to make relatively small performance
    groups, your SAN will benefit greatly from applying the principal of locality.
    For now, you simply need to be able to determine the category of architecture
    you will require: one that has lots of known locality (has well-defined
    performance groups), or one that does not. This will affect how many switch
    ports you need to allot for ISLs. If traffic is localized within an area of the
    SAN, it will obviously not need to make use of ISLs leaving that area. In this
    case, you will be able to get superior performance even with far fewer ISLs,
    resulting in more ports available for servers and storage.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors

    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo


    Establishing Port Requirements

    Now you will determine how many switch ports you will
    need to purchase. (This is a general estimate for calculating ROI; it might be
    a bit more or less than your final estimate.)

    Take the ports you found out about during the
    interview process. Make sure that you account for all ports on each node. Some
    RAID arrays have many ports, and many hosts have at least two HBAs. Add up
    these ports to get the total number of exposed ports your SAN will require. You
    will then divide this by the number of different fabrics you will be using. If
    you have dual-redundant fabrics, you will divide by two. If you have
    triple-redundant fabrics, divide by three, and so on. This will give you the
    number of required exposed ports per fabric. The number of “overhead” ports you
    must allocate for ISLs and for unused ports will depend on several factors:

  • The total number of
    required ports per fabric.
  • The amount of known
    locality.
  • Your need to manage all
    switches as a single entity.
  • The physical layout of
    your SANany MAN/WAN connections, or intra-building campus connections, or
    intra-floor building connectionsmight dictate use of additional ISLs and less
    than perfect utilization of the ports on each switch.
  • Your applications’
    expected performance characteristics.
  • The rate of expected growth
    in port count of the fabric.
  • Your maintenance
    policies regarding port usages on network devices. For example, you might
    require that a certain number of ports be left available for expansion or
    troubleshooting during the course of normal operation.
  • Simple Case

    If the number of required exposed ports is less than
    the number of ports on a single switch, you will generally need zero ports for
    ISLs. In this case, you will require one switch per fabric. However, as larger
    switches utilize more hardware internally to connect the higher number of user
    ports, a decision might need to be made between using a larger switch versus
    utilizing a network of smaller ones. The appropriate decision will depend on
    performance requirements, budget, and design factors. In addition, if you have
    made small performance groups that have no components in common, you might be
    able to localize traffic 100 percent, and require no ISLs. You would have many
    small, unconnected SAN islands if you follow this approach. One reason not to
    use isolated islands is that requirements change. Someday you might need access
    between islands at a moment’s notice. A robust architecture can achieve your
    immediate connectivity requirements, and give you the flexibility to handle
    change as well.

    You will require each fabric to be a network if this
    is not the case, or if you wish to design in flexibility to your configuration.
    You will have to reserve port count for these. Simple case requirements include
    the following:

  • Fewer ports required
    than exist on a single switch, or
  • Each performance group
    is well defined and smaller than the number of ports on a single switch.
  • Future requirements for
    growth and change are minimal.
  • Assume that you have two 16-port arrays (32 storage
    ports total), 10 dual-HBA servers (20 ports), and two single-port tape
    libraries (two ports). Your total port count is 54. However, assume further
    that you are using a dual-redundant SAN architecture. Your port count per
    fabric is 27. You are building the fabric out of 16-port switches. It is
    possible that some ISLs are required. You will need to determine how many are
    needed.

    Variant A

    With a relatively small fabric like this and
    relatively high locality, you can assume that you will have about 14 free ports
    per switch. Two switches with two ISLs between them will yield 28 ports per
    fabric. You are using a dual-redundant architecture, so there will be two
    fabrics, for a total of four switches. Your grouping diagram will look like Figure
    5.7.


    Figure 5.7 Determining ISL Requirements for Variant A

    This grouping would result in an actual
    implementation resembling Figure 5.8.

    Figure 5.8 Variant A Implementation

    Variant B

    If you decide that you cannot guarantee the
    localization of traffic for some reason, grouping will not help. Assuming also
    that you have a requirement for high performance between the switches, you
    would add two ISLs per switch to the estimate, for a total of about four ISLs
    per switch. Your architecture might look Figure 5.9.

    Figure 5.9 Adding ISLs for High Performance in Variant B

    The same technique can be applied to any SAN, no
    matter how complex. In fact, the larger the SAN, the greater the benefits will
    be from grouping traffic.

    Moderate Case

    If the required exposed port count is about double or
    triple the per-switch port count, and some locality is known, you will be able
    to use very few ISLs. In this case, estimate two ISLs per switch. Let us say
    that you need 26 ports, and you are using 16-port switches. Two ISLs per switch
    means that you actually get 14 ports per switch. Two switches will give you 28
    ports, so you would budget for two switches per fabric, or four switches total.

    Moderate case requirements include the following:

  • No more than three
    times as many ports are required than are present on a single switch.
  • Performance groups are
    reasonably well defined. Some locality is known.
  • Future requirements for
    growth and change are minimal.
  • Note: The low port count/high locality/low ISL count
    configurations work well for either two or three switches. Two switches would
    be cascaded together with two ISLs, with 16-port switches yielding 28 ports.
    Three switches would be connected in a ring, supporting about 40 devices. If you
    are over that limit, a four-switch full mesh can support about 50 devices. The
    full-mesh architecture does not scale well beyond that point, and none of these
    work well if you have performance groups with more than 13 or 14 members. It is
    feasible to build ring or partial-mesh topology fabrics with higher port
    counts, but it is generally better to use a core/edge topology for higher port
    count solutions. These topologies are explained in detail in Chapter 7.

    >

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors

    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo


    Complex Case

    If you need more ports than one of these
    configurations will handle, you will need to allocate about four ISLs per
    switch. You might use fewer than four ISLs on some switches, and perhaps
    nothing but ISLs will be present on other switches. In the complex case for
    port count estimates, the intent is to average the ISL requirements.

    Until a detailed architecture is developed, you will
    have to make general estimates for a few things. If you have any distance
    requirements, add two ISLs per switch. If you have very high-performance
    requirements, and very little known locality, add two ISLs per switch.

    Take the estimated number of ISLs per switch (I) and
    subtract it from the number of ports per switch (PS). Divide the total required ports per fabric (P) by
    this number and round up. This is the estimated number of switches (S) that you
    need to budget for. For estimating complex SAN switch counts, S=P/(P
    S I).

    For example, if you have a need for 30 ports per
    fabric (P=30), are using 16-port switches (P

    S=16), and each switch will use about two ISLs (I=2),
    then the number of switches you estimate needing per fabric is 30/(162). This
    is 2.14, which rounds up to 3. If you have a single fabric, this is the number
    of switches you should budget for. If you have a dual-fabric SAN, you should
    budget for six switches. Complex case requirements include the following:

  • Any number of exposed
    ports might be required.
  • Performance groups
    might or might not be defined.
  • Future requirements for
    growth and change are significant.
  • Preparing an ROI Analysis

    In any business transaction, it is important to
    understand the economic benefits or the Return On Investment (ROI) that your
    company will receive. Preparing an ROI analysis for your SAN project will show
    how your company will not only return the capital investment, but also save
    additional money as well in time, management, and other efficiencies.

    During the interview process, you made a list of all
    of the equipment that you would need to purchase. To begin the ROI analysis of
    your SAN, determine which components are specific to the SAN project. For
    example, if your company will need to buy additional storage arrays whether or
    not a SAN is used, these would not be included on the expense side of the
    analysis. If the SAN is expected to prevent you from having to buy an array,
    this cost savings would go onto the benefit side of the analysis. You should
    include any hardware you intend to buy for testing that will not be used
    elsewhere.

    When accounting for staff time spent on the project,
    make sure that you only charge the project for time spent beyond what would be
    spent by not building the SAN. If you are expected to save staff time in the
    long run, apply this to the benefit side. Your ROI analysis will be a living
    document, and will be updated as the SAN project develops.

    The Return On Investment Proposition

    Technical justifications for SAN infrastructure
    deployments can often be made more credible by adding an ROI analysis for the
    proposed implementation. Follow the guide in the following sections to produce
    an ROI analysis based on SAN solutions to particular problems.

    Step One: Pick a Theme or Scenario

    Most implementations have a purpose. That purpose
    could be a server or storage consolidation to improve infrastructure usage and
    gain economies of scale, ensuring storage and server resources are utilized in
    the most cost-effective manner. High-availability clustering can improve the
    availability of mission-critical applications, thus ensuring business
    continuance and the cost saving associated with it. SAN-based backup deployments
    improve data integrity by performing backups and restores more efficiently and
    quickly, again saving in business continuance time and effort.

    Step Two: Identify the Affected Infrastructure
    Components

    Most SAN deployments will focus on affected servers.
    Servers can be grouped according to the applications they run or the functional
    areas they support. Examples of application groupings include Web servers, file
    and print servers, messaging servers, database servers, and application
    servers. Functional support servers might include financial and personnel
    systems or engineering applications. Once the server groups are known, get the
    characteristics of servers in each group. For example, if your solution fits
    into a storage consolidation theme, you should consider factors such as:

  • Amount of attached disk
    storage
  • Storage growth rates
  • Storage space reserved
    for growth (headroom)
  • Availability
    requirements
  • Server downtime and an
    associated downtime cost
  • Server hardware and
    software costs
  • Maintenance costs
  • The administration
    effort required to keep the servers up and running
  • Step Three: Identify the SAN-Enabled Benefits

    The scenario approach allows you to focus more
    closely on the benefits. Server and storage consolidation, for example, will
    concentrate on benefits accrued from more efficient use of server and storage
    resources, improved staff productivity, lower platform costs, and better use of
    the infrastructure. Simply take the list of characteristics you developed in
    step two, and show how a SAN can provide benefits in those areas. Establishing
    specific cost savings is one of the two key elements in the ROI process, so be
    sure to look hard for every area of benefit.

    Step Four: Identify the SAN-Related Costs

    Determining the costs associated with the scenario
    involves identifying the new components specifically required to build and
    maintain the SAN. These can include software licenses, switches, Fibre Channel
    HBAs, optical cables, and any service costs associated with the deployment. Be
    careful to include only those items that relate directly to the SAN
    implementation. This is the second key element in the ROI process: if you do
    not correctly estimate expenses, the ROI might be substantially better or worse
    than your estimate.

    Step Five: Calculate the ROI

    There are several standard ROI calculations in common
    use, such as net present value (in dollars), internal rate of return (as a
    percentage), and payback period (in months). Briefly, these can be defined as:

  • Net Present Value (NPV) A method used in evaluating
    investments where the net present value of all cash flows is calculated using a
    given discount rate.
  • Internal Rate of Return (IRR) A discount rate at which the
    present value of the future cash flows of an investment equal the costs of the
    investment.
  • Payback Period The length of time needed to recoup the cost
    of a capital investment on a nondiscount basis.
  • Detailed explanations of these techniques and how to
    use them can be found in most accounting textbooks. It is likely that your
    company has a preferred method for calculating ROI. You should determine which
    method this is, and if there are standard forms for presenting your analysis.
    Asking your accounting department might be a good first step.

    This approach to calculating ROI allows you to focus
    on a particular project or infrastructure-based problem. It allows you to
    reduce deployment risk by deploying SANs in phases by scenario. Deploying by
    scenario will keep investments limited to the solution at hand and create an
    investment base for future deployments. The initial investment will improve the
    ROI on other scenarios by reducing some of the investment required to deploy
    them.

    The Rest of the Process and the Repetition of the
    Cycle

    Now you have the following documents:

  • Detailed results from
    the interview process, which define what the SAN project needs to accomplish.
    This includes:
  • –A technical requirements document

    –A timeline for accomplishing the tasks associated
    with implementing the SAN
    –A list of everything that you will need to buy to make
    the project work

  • A rough idea of how the
    SAN will be designed.
  • An ROI analysis to
    justify continuing with the project.
  • These will be used and maintained throughout the life
    of the SAN. The timeline will be the framework in which all activities in the SAN’s
    lifecycle will reside. In later chapters, you enter the architecture
    development phase and will use these documents to develop a detailed
    architecture for your SAN. This will in turn be used to develop a test plan.
    These documents will be used in the approval process for implementation, and
    will be kept up to date during the maintenance phase as part of the SAN’s
    documentation set. If any major changes to the SAN are needed, the lifecycle
    will be repeated and another set of documentation will be produced.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors

    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo


    Summary

    The SAN design process consists of seven phases,
    which are cycled through as needed throughout the life of your SAN. Data
    collection and analysis together define the requirements of your SAN. These
    requirements feed into the architecture development process to produce a SAN
    design blueprint. After you have a plan in place for your SAN, you must test
    certain components to ensure that it is working the way you thought it would,
    before you can begin to transition and release it into production. Once the SAN
    has entered production, it falls into an ongoing maintenance phase, and
    continues in that phase until a change occurs that causes the cycle to repeat.

    The first two phases (data collection and analysis)
    are critical to the health of the SAN. Simply put, if the information on which
    the design is based is incomplete and/or inaccurate, the design will be
    incorrect.

    Data collection consists of a series of interviews,
    collecting the answers into a meaningful format (a technical requirements
    document), and verifying the accuracy of the collected data. It is imperative
    that all key stakeholders in the SAN project be included on the interview list.

    While listed as a separate phase, data analysis
    actually coincides with data collection. The objective of the analysis phase is
    to turn the raw data, which is generally in the form of business requirements,
    into a more technical formatthe technical requirements document. Some of this
    occurs “on the fly” during the interview process. However, certain tasks are done
    after the interviews are complete. For example, detailed port count and
    performance requirements are generated “on the fly,” and an ROI proposition is
    created after the fact. Once the requirements of the SAN are well defined, the
    remaining phases can take place. These phases are covered in subsequent
    chapters.

    Solutions Fast Track

    Looking at the Overall Lifecycle of a SAN

    q      
    The SAN design process
    is a cycle.

    q      
    This process consists
    of seven phases:

    1. Data
    Collection

    2. Data
    Analysis

    3. Architecture
    Development

    4. Prototype
    and Test

    5. Transition

    6. Release to
    Production

    7. Maintenance

    q      
    Whenever there is a
    fundamental change to the SAN, the cycle should repeat.

    Conducting Data Collection

    q      
    Data collection is the
    foundation on which a SAN is built.

    q      
    You should interview
    everybody who has an interest in the project.

    q      
    During the interview
    process, create a technical requirements document.

    Analyzing the Collected Data

    q      
    There are several
    things that you need to get out of data analysis:

    The number of different
    fabrics that will make up the SAN solution

    The port count and
    performance characteristics of each fabric

    An estimate of the hardware
    required to meet these requirements

    q      
    You might be able to
    localize traffic for better performance if you can create well-defined groups.

    q      
    Prepare an ROI
    proposition to justify your SAN project.

    Frequently Asked Questions

    Q: Once I have designed my
    SAN, shouldn’t it be done? I don’t want to have to keep reinventing the wheel!

    A: Yes and no. After a SAN
    enters production, it is “done” until you want to change it in a fundamental
    way. As long as you are happy with leaving your SAN the way it is, there is no
    reason why you would have to repeat the design cycle. Simply adding a new
    storage array does not require a repetition of the cycle. Moreover, events that
    do cause the cycle repeat might cause it to repeat relatively quickly. For
    example, if you decide to go through the design process because you are adding
    a new type of storage array to the SAN, and want to validate that doing so
    won’t break anything, you will be able to take a fast track through most of the
    process. After all, adding this device will not by any stretch of the
    imagination require that you change your fabric topology, or affect much of
    your SAN architecture.

    Q: Every end user in my
    company is a stakeholder in the SAN. Do I need to interview everybody?

    A: It is true that
    everybody who uses a system is a stakeholder in that system. However, we mean
    something a little less broad. When we refer to a stakeholder, we mean somebody
    whose job revolves around taking care of one or more of the systems that will
    attach to the SAN. This can include systems, database, and storage
    administrators, as well as other technical people. It can also include people
    responsible for the data that resides on these systems. For example, a manager
    responsible for a call center at a phone-in catalog company might be a key
    stakeholder in the SAN, because he or she is responsible for the data entered
    into that company’s business systemwhich is attached to the SAN. Why is this
    person a key stakeholder? Because he or she might have something to say about
    the availability and performance requirements of the system. When in doubt, try
    to include anybody on the team who wants to be there. It is usually better to
    have more data than you need, rather than less.

    Q: Do I need to wait until
    data collection is complete before beginning data analysis?

    A: Actually, the data
    collection and analysis phases are most effective if there is some degree of overlap.
    If you have analyzed data from the first interview when you go into the second,
    you will be able to better understand the answers, and might also be able to
    direct the line of questioning along more useful lines. Be careful not to
    develop firm convictions too early on, though. Always approach SAN design
    scientifically. Never start an interview with a firm preconception of the
    outcome! Collection and analysis are divided into two phases because some of
    the analysis naturally occurs after all data collection is complete. For
    example, you can’t prepare an ROI proposition until you have a fairly complete
    picture of what the SAN will need to accomplish, and some idea of the technical
    infrastructure that will be involved.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors

    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.

    Get the Free Newsletter!

    Subscribe to Cloud Insider for top news, trends, and analysis.

    Latest Articles

    15 Software Defined Storage Best Practices

    Software Defined Storage (SDS) enables the use of commodity storage hardware. Learn 15 best practices for SDS implementation.

    What is Fibre Channel over Ethernet (FCoE)?

    Fibre Channel Over Ethernet (FCoE) is the encapsulation and transmission of Fibre Channel (FC) frames over enhanced Ethernet networks, combining the advantages of Ethernet...

    9 Types of Computer Memory Defined (With Use Cases)

    Computer memory is a term for all of the types of data storage technology that a computer may use. Learn more about the X types of computer memory.