By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo
Solutions in this chapter:
- Looking at the Overall Lifecycle of a SAN
- Conducting Data Collection
- Analyzing the Collected Data
- Summary
- Solutions Fast Track
- Frequently Asked Questions
Introduction
We intend this book to allow you to effectively
design, implement, and maintain storage networks. Doing so requires an
understanding of the processes in each of the seven phases of a SAN’s
lifecycle, and their relationships with each other. Without taking a moment to
review the process from the highest level, it is easy to get lost in the
details of SAN hardware.
In this chapter, we provide that high-level view. We
show how the SAN design process is really an ongoing lifecycle. We take you
through the process from the moment the decision is made to deploy a SAN,
through releasing the SAN to production. Then we explain the extent to which
the process should be repeated when upgrades and architectural changes are
needed. We also provide detail on the first two parts of the lifecycle.
The processes presented here are derived from other
areas of Information Technology (IT) and they are normal parts of any
large-scale IT project. For example, when implementing a SAN, you should
interview people who will have a key interest in the finished productthe same
is true when putting in a Local Area Network (LAN) or Wide Ares Network (WAN).
Much of this material should be second nature to any IT network architect,
Database Administrator (DBA), or senior systems administrator. For the more
advanced users to whom these techniques are well understood in general, this
chapter will serve as reference material showing how these processes are
applied to SANs in particular. We have attempted in this book to provide
material that will allow both the beginner and the expert alike to successfully
design a SAN.
It is true that more attention must be paid to SAN
design than to most other networking technologies. This is because SANs
typically have more stringent availability and performance requirements than
other networks. A SAN is similar to a traditional network in its requirements,
but is also somewhat like a channel (for example, a CPU/RAM interconnect
mechanism, or a PCI bus). Channels require very high performance, and are
almost assumed to be 100 percent reliable. This is in stark contrast to the
traditional Ethernet LAN, where things like five-nines uptime for all node
connections, in-order packet delivery, and tuned approaches to bandwidth
management are rare indeed.
Fortunately, SANs provide the tools necessary to
achieve these performance and availability goals. For example, it is
commonplace in a Fibre Channel SAN to use a dual-fabric approach to SAN
architecture. This means having two completely separate networks for data to
travel over, and potentially using both networks as active paths. While it is
certainly possible to do this sort of thing using IP/Ethernet networks, it is
substantially more difficult, since Fibre Channel was designed with this in
mind, and Ethernet was not. The SAN designer must provide for higher
availability and spend some time thinking about performance, but will know
going into the process that these goals are entirely achievable.
We should also note here that the process outlined in
this chapter is designed to make a complex SAN design successful. With less
complex designs (that is, the majority of SAN deployments to date), it is
perfectly acceptable to skip over much of the process. For example, if you are
deploying a SAN with only three servers and two storage arrays, spending much
time on architectural analysis is unnecessary. The complexity is presented here
so that users with complex requirements will have it available to them; users
with simpler scenarios can use their judgment about which bits to incorporate
into their design process.
The seven phases of the lifecycle of a SAN at the
very highest level can be broken down into three broad categories: design,
implementation, and maintenance. The first of these, designing the SAN,
includes the collection and the analysis of data, which defines the
requirements of the network. We will go into detail on these first two phases
of the design process in this chapter. These phases provide a solid launch pad
for your journey through the remainder of the SAN’s lifecycle.
The third and fourth phases
of the SAN lifecyclearchitecture development and prototype testingcomplete
the design process. Implementing the SAN encompasses the transition phase and
the release to production phase, the fifth and sixth phases of the lifecycle.
These phases are discussed in Chapters 6 and 7 of this book. Chapters 8 and 9
cover the troubleshooting, maintenance, and managementthe final phases of the
lifecycle model.
When you are finished
reading this chapter, you should have a solid understanding of the design
processes, and have a valuable reference tool to enable project planning on any
future SAN deployments.
Looking at the Overall Lifecycle of a SAN
Any SAN will go through certain phases over the
course of its life. Depending on the size and complexity of the SAN, some
phases might take months to complete, and some might be only glanced over. For
example, a single-switch SAN does not require much in the way of network
design. However, if the solution involves hundreds of devices, including
storage components from many different vendors that were not already pretested
and determined to be interoperable, it could require extensive testing or
validation.
When an existing SAN must undergo a fundamental
change, be it at the architectural level or simply the introduction of a new
type of storage array, you should cycle back through the phases of SAN
development. This will ensure that the critical applications running on the SAN
are not unexpectedly disrupted by changes. However, when the change is
fundamental but small (like adding a new type of storage array) it is possible
to take a fast track through this process.
The SAN’s lifecycle, which
can be described at a high level as design, implementation, and maintenance,
translates directly into action-oriented phases on the part of the SAN
designer: data collection, data analysis, architecture development, prototype
and testing, transition, release to production, and maintenance. See Figure 5.1
for a flowchart of these phases and their relationships to each other.
Figure 5.1 An
Overview of the Lifecycle of a SAN
Data Collection
You must define the requirements of the SAN before
building it. What business problem is being solved by the SAN? What are the
overall goals of the project? To determine the requirements, you should
interview all affected parties, to find out what they all hope to achieve (in
other words, their goals and objectives), and develop both a detailed technical
requirements document and a timeline for the project.
Data Analysis
Once you have gathered input from all parties, you
must analyze it and put it into a meaningful format. The first two phases
together will allow you to start with the business goals that are driving the
project, and determine at a high level the necessary technical properties
required of the SAN. Once this phase is completed, all business requirements
should be translated into technical requirements. The technical requirements
document will be created during the collection phase, and completed during the analysis
phase. You will also have created a working document for a Return On Investment
(ROI) proposition to justify the expense of the project.
Architecture Development
Now that you have a list of technical requirements,
you will develop a SAN architecture that meets those requirements. This process
will involve balancing many factors. For example, there might be a tradeoff
between performance considerations and cost. It might be necessary for you to
cycle back to the data collection and analysis phases to gather more input to
make compromises with input from all affected parties. When finished, you will
have a detailed architecture of the SAN that you intend to build. A SAN
architecture includes the fabric topologies of all related fabrics, the storage
vendors involved, the SAN-enabled applications being used, and other
considerations that affect the overall SAN solution. This step is the most
likely to be skipped over quickly when the SAN requirements are small.
Prototype and Testing
SANs deal directly with the mission-critical data of
today’s enterprises. When building any mission-critical solution, you must test
it before releasing it to production. In this phase, you will build a prototype
of the SAN solution and test it to ensure that it will function properly when
released. This should be done using nonproduction systems. It might be
necessary to cycle back to the architecture development phase if problems are
found.
Wherever possible, build a test bed identical to the
solution you are implementing. This will provide the greatest assurance of
success in production. However, budgetary concerns, limits on time and space,
and other factors will usually prevent this from being practical. Imagine a
200-port SAN. Now imagine 200 hosts and storage arrays plugged into it. Now
imagine asking the CFO to buy another 200 devices to test with, and to provide
administrators, space, power, and cooling for all of it.
Because of this, the test phase will be a balance of
conducting your own testing, and leveraging other organizations’ test results.
Finding a document that says “vendor X already tested or certified this
configuration” might be as good or better than testing it yourself. Even if the
components of a solution have been tested by you and/or others to your satisfaction,
you must test all aspects of the complete system prior to releasing it to
production. This is due to the fundamental nature of a large networked system
where interactions, timing, and other factors can produce different results
from devices tested individually. The actual final test will occur during the
release to production phase, but creation of the test plan should occur in this
phase. At the end of this phase, all parties with an interest in the outcome of
the project will approve it, and the transition to production will begin.
Authors
Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.
By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo
Transition
Now that you have a working prototype, and all
interested parties have signed off on it, you will begin to transition your
existing hardware onto the new SAN. If a SAN is already in place, this phase
might be as simple as adding a new node to the SAN, or changing the
Inter-Switch Link (ISL) architecture. If the SAN is completely new, it might
involve a long migration process consisting of moving one production system at
a time. In any case, there might be a need to cycle between this phase and the
release-to-production phase repeatedly. Once a component has completed the
transition onto the SAN, release to production can occur for that component.
Release to Production
Once a component has been transitioned onto the new
SAN, it must be tested again and then approved before becoming a part of the
enterprise’s production environment. Since there might be many components that
must be transitioned and released, it might be necessary to cycle between the
transition and release-to-production phases repeatedly until all components
have entered production. After this phase is complete, the SAN will enter the
maintenance phase.
Maintenance
This is the useful life of the SAN. All of the
benefits that prompted the SAN designer to implement the SAN in the first place
are found in this phase. It is therefore desirable to have a SAN spend as much
time as possible in this phase, and as little as possible in the other phases.
The goal of this phase is to keep the SAN running as well as possible for as
much of the time as possible, and to expand its capabilities only according to
the original, tested, and approved parameters. This phase includes adding,
changing, or removing components, as well as managing, monitoring, and troubleshooting
existing components.
During the maintenance phase, no changes should be
made to the SAN that fall outside of the original blueprint that was
established in the previous phases. Any such change necessitates a repetition
of the entire lifecycle. For example, if the SAN were originally built using
vendor X storage arrays, an additional vendor X array could be added as part of
maintenance, but an array from vendor Y would require thought and testing
before its introduction. It might not require much thought and testing, but it
must, in any case, be looked into.
Note: Any fundamental change to the SAN requires a
repetition of the entire lifecycle.
In summary, the seven phases of the SAN design
lifecycle are:
1. Data Collection
2. Data Analysis
3. Architecture Development
4. Prototype and Test
5. Transition
6. Release to Production
7.Maintenance
Conducting Data Collection
The data collection phase of SAN design is the
foundation upon which the SAN will be built. It is vital that the information
collected in this phase be both complete and accurate. If the SAN requirements
are poorly defined, it is guaranteed that the resulting SAN will meet business
objectives poorly. You should therefore take your time with this phase.
Some of the information you will collect is generic
to any major IT project. If you already have an established data collection
process in your company, integrate the SAN-specific material from this section
into that process.
Data collection consists of determining which people
you will need to interview, interviewing them, and conducting a physical
assessment of existing equipment and facilities. When this process is complete,
you will have a technical requirements document consisting of a list of the
business problems that the SAN will solve, the business requirements for the
SAN, characteristics of all devices that will be attached to it, and detailed
information about all relevant facilities. You will also have a timeline for
implementation.
Creating an Interview Plan
Who has a stake in the SAN solution? Well, you could
argue that every person who uses a system attached to the SAN has a stake in
it. While true, this is not useful for creating an interview list, because
there would be too many people involved. Similarly, you could argue that only
the person who initiated and “owns” the project should be consulted. Again,
this is not useful, because it leaves out people who have a strong interest in
the project, and might have knowledge that is critical to its success.
A balanced approach to creating an interview list is
critical. You can view the people on this list as a SAN solution “core team.”
Think about having all of these people together in a room, and trying to solve
the SAN solution problem together. Try to include everyone needed to solve the
problem, but nobody else. Typically, a core team might include:
administrator
administrator
server will be involved
specialist associated with each application that will run on the SAN
who can act as an overall “owner” of the project
It is probable that you will be one of these people,
in addition to being the SAN designer. Unless you are an external consultant,
this is typically the case.
Once you have a list of the desired members of the
core team, you must contact them and ask them to take time to help with the
project. Ensure that each team member has allocated the necessary time and that
their management appreciates the demands of participating in this team. As the
SAN design goal of the team might require a long-term process, getting this
buy-in initially will minimize disruption to the team later. Often in the past,
SAN design teams did not include network administrators, as the focus was on
the storage side. Experience has shown that SANs are networks, and should be
coordinated with the traditional IP network groups to ensure that proper
networking experience is at hand.
Whenever possible, schedule an interview as a
face-to-face, one-on-one meeting. This format will allow you to communicate the
questions and understand the answers most effectively. You should also have a
group meeting with the entire core team after conducting individual interviews.
This will allow you to resolve any differences before analyzing the data, and
review the analysis as a team.
Conducting the Interviews
Now that you know who to interview and have a
schedule of when you will interview them, you need to know what questions to
ask, and what format to put the collected data into. This section contains a
suggested set of questions that you should ask, and some detail on what each
question is about. It is followed with a summary that could be used to create
an interview form.
Note: Not every person you interview will be able to answer
every question. Between the members of the core team, the expertise necessary
to answer all of these questions should be completely represented. Some members
might provide conflicting answers. You will be in a key position to resolve
these differences, and achieve a compromise. It is vital that all affected
parties agree with the deployment strategy before implementation begins.
What Overall Business Problem Are You Trying to
Solve?
A business problem that would initiate a SAN design
might be something like:
business running in case of a disaster like an earthquake or fire.”
to finish that they are impacting our ability to process customer orders.”
on storage by utilizing free space more efficiently.”
Chapter 6 discusses some of the more common business
problems that SANs can solve. Brocade maintains a series of documents that
detail specific SAN solutions. These documents are known as Brocade
SOLUTIONware configuration guidelines and are available on the Brocade Web site
at www.brocade.com/SAN.
Note: A SAN might be intended to solve multiple business
problems. In this case, you should separate each business problem into a
different set of questions and answers. You will correlate these during the
analysis phase.
What Are the Business Requirements of the Solution?
Once you know the business problem that you need to
solve, it should be easy to figure out what the business requirements of the
solution must be. This is simply a matter of rephrasing the previous answers,
with more specific criteria:
functionality of all business-critical servers at site X to resume within Y
minutes at site Z.”
following list of servers to complete backups within X minutes: “
following list of servers access to the corresponding list of storage arrays: “
This is useful because it acts as a migratory step
toward turning the business problem into a matching technical solution.
Authors
Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.
By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo
Moving from Business Requirements to Technical
Requirements
You should not deploy a SAN simply for the sake of
adopting the “hot new technology.” SANs are hot because they solve important
business problems and allow companies to make more money. This could be fairly
directfor example, a matter of saving more money on IT than the project cost,
since SANs are very efficient at providing a clear ROI. ROI is often achieved
by management efficiencies, resource efficiencies, or better utilization of
resources. On the other hand, it could be indirectby making IT systems more
efficient, thus increasing users’ productivity.
The first key to a successful SAN deployment is the
accurate and complete statement of what business problem(s) you intend for the
SAN to solve. Unfortunately, you cannot turn a business problem into a
technical solution without work. There is no silver bullet to make your backups
run faster so that your users will not have to work on a slow system. However,
there are tape libraries that run fast, and can be shared by many devices.
This, when combined with an appropriate Fibre Channel fabric, and a SAN-enabled
backup application, could amount to the same thing as the silver bullet.
In order to know which hardware and software will
solve your business problem, you have to define in a technical way what you
need to accomplish. This is a necessary intermediate step between the business
problem and the purchase of specific technical solutions.
It is fairly straightforward to change a sentence
like, “We need to keep our business running in case of a disaster like an
earthquake or fire” into a sentence like, “The SAN must allow all functionality
of all business-critical servers at site X to resume within Y minutes at site
Z.” Once you have done this, you will have the business requirements of the
solution. You know that you have a business requirements statement when you
could phrase it like this, and still have it make sense: “Our business will run
better if we have a SAN that can allow all functionality of all
business-critical servers at site X to resume within Y minutes at site Z.” The
components of the business requirements statement are “our business will run
better” (or something to that effect) followed by a reasonably specific
statement about what the SAN must do to make that happen.
However, you will still not have the technical
requirements detailed. This is not something that you, the SAN designer, can
simply ask in an interview. This is a large part of what you will bring to the
table as the SAN designer once you have gathered
the data and then analyzed it in the next phase. A technical requirements
document set should list, in detail:
are to be attached to the SAN
patterns between them (random I/O, streaming access such as video,
I/O-intensive database access)
(reads, writes, max/min/typical throughputs)
on them relative to the SAN (for example, a LAN-free backup application, or
anything SAN-specific)
expected to change over time (storage growth, server growth)
The technical requirement statement would be, “The
SAN needed to meet the business requirements outlined must have the following
characteristics: ” This would be followed by the body of the technical
requirements document. The rest of the questions to ask in the interview
process will provide you with the body of this document.
What Is Known about the Nodes that Will Attach to the
SAN?
You should try to get a list of all information
possible about every node attached to the SAN. For each node, the relevant
information can include questions about each host, storage device, facilities
where hosts and storage will be located, and questions about the SAN itself.
Questions about each host could include the following:
is installed? What patch or service pack level?
HBA/controller drivers available? Are they well tested?
is supported (private loop, public loop, or fabric)?
run on this host (databases, e-mail, data replication, file sharing)?
it require?
requirements change over time?
its dimensions? How heavy is it?
Does it have a rack kit? Will it set on a shelf?
management console, what type is it? (Is it a traditional keyboard/video/mouse
combo [KVM], or is it a serial connection, like a TTY?) Does it need to be
permanently attached? (For example, a Sun SPARC server could have a keyboard,
mouse, and monitor permanently attached, or it could be managed through a
serial port attached to a modem.)
have?
HBA, what software will be used to provide failover or performance enhancements
of multiple paths?
exist, or do they need to be purchased? (You should keep track of every piece
of equipment that you need to buy for the project, for budgeting and ROI
analysis.)
the make, model, and version information?
be purchased to meet the objective?
interfaces will it have?
range will it operate?
telephone line for management?
physically located?
These questions could be used to create an interview
form for each host, which might look like the following:
Questions about each storage device could include the
following:
model, and version information?
is supported (private loop, public loop, fabric, SCSI, SSA)?
device serve?
device, does it have limits on how many hosts can access it through each port?
its dimensions? How heavy is it?
gigabytes?
Does it have a rack kit? Will it sit on a shelf?
management console, what type is it? Does it need to be permanently attached?
interfaces will it have?
exist, or do they need to be purchased?
the make and model? If not, what kind will be purchased?
interfaces will it have?
range will it operate?
Note: Obviously, some of these questions do not relate
directly to the SAN deployment. However, they are generally relevant whenever
making a large architectural change in a data center. For example, it is
necessary to know what temperature a server can operate at in case the server
is in a location where temperature control is an issue. In this case, adding a
large number of switches might increase the room temperature beyond operating
levels. As always, use your judgement about which questions to include in your
interview, and which to skip over.
telephone line for management?
physically located?
level?
what is the capacity of each cartridge, number of cartridges the library can
hold, number and speed of drives, and number of transports?
interface? What type of SCSI (wide/narrow, differential/single ended)?
Note: While it is possible to manage an entire fabric
through a single Ethernet connection, this is not the method that Brocade
currently recommends. You should plan on one Ethernet connection per Brocade
switch, in addition to planning connections for hosts and other SAN devices. It
is also advisable for the highest availability plan to balance switches across
multiple electrical circuits, even if an Uninterruptible Power Supply (UPS)
protects them.
Questions about facilities where hosts and storage
will be located could include the following:
this facility?
optical cables, and what type?
electrical power?
space?
infrastructure?
location is on an upper floor, is there a freight elevator?
Answers to questions about the SAN itself must be
considered preliminary. They will indicate preconceptions that members of the
core team have, but all members should be prepared to be flexible on these
preconceptions as the SAN design process progresses. Questions about the SAN
itself could include the following:
(For example, long cable runs between floors of a building, campuswide
networks, or MAN/WAN connections.)
attach to the SAN?
devices will attach to the SAN?
do they require any-to-any connectivity? Alternately, are there groups of
devices that need to communicate only among themselves?
Authors
Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.
By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo
Moving from Business Requirements to Technical
Requirements
Which SAN-Enabled Applications Do You Have in Mind?
Will the SAN use a serverless backup application? How
about clustering software? How about volume management? This category of
software requires special attention because of its close ties to the SAN
hardware you choose to build the solution. For example, if you plan to use
vendor X serverless backup software, you must make sure that your backup
hardware (tape libraries, Fibre Channel/SCSI gateways, etc.) is supported.
Which Components of the Solution Already Exist?
Any hardware or software that is already in place and
that must be included in the solution will create points for you to build
around. You must find out as many details as possible about everything in this
category. When you are finished with the interviews, and conduct the physical
assessment, you should personally inspect every piece of hardware. This will
prevent surprises later in the process. Make sure that you find out exactly
where all hardware is located, and how to access it.
You must pay special attention to devices that
already exist and already have Fibre Channel interfaces. Find out which kinds
of HBAs are installed in hosts, and which driver revisions are installed on
them. Find out code levels for RAID arrays and Fibre Channel tape libraries.
Find out if upgrades to driver/code levels are planned or at least allowed.
Note: You must know if each device is public loop, private
loop, or full fabric. Some devices might even be SCSI and require additional
hardware to bridge between SCSI and Fibre Channel.
If possible, you should not use private loop drivers
on initiators unless the device does not support fabric drivers or is not easy
to upgrade. Private loop hosts require special licenses, typically Brocade
QuickLoop and Zoning. Find out if the existing devices are configured as
full-fabric devices. If not, find out if their drivers support full fabric, or
if they can be upgraded to full fabric. This is not intended to discourage
incorporation of private loop devices into a fabric: QuickLoop and Fabric
Assist exist specifically to enable this to occur. However, if a device can
support full fabric, then integration into the SAN will be easier if it does
so.
Which Components Are Already in Production?
Components that are in production require special
attention in two areas:
might be desired for testing.
more complex.
It is vital to know as much as possible about
production systems that are going to transition onto the SAN. Therefore,
somebody intimately familiar with and responsible for every such system should
be included on the core team.
Which Elements of the Solution Need to Be Prototyped
and Tested?
For relatively simple solutions that involve only
components already certified to work together, it might be that you do not have
to do any testing at all. For example, if you are implementing a SAN-based
solution on a Brocade SOLUTIONware document, you might feel that you need only
to do minimal validation. This is opposed to a solution where no documentation
or testing information exists, which generally requires extensive validation.
For more complex solutions involving a large number
of devices that might be from many different vendors, you might feel that every
single element needs to be tested in combination before release to production
can occur. You should get input on this from every member of the core team. If
any team member feels that you should conduct inhouse testing on a component,
you should strongly consider doing so.
What Equipment Will Be Available for Testing?
Any existing equipment that is not in production, and
any equipment that is going to be purchased specifically for this project might
be good material with which to build a test bed. Existing equipment that is in
production is not good to test with. If existing equipment already in
production will be transitioned onto the SAN, it might be beneficial to budget
for a representative sample of duplicate, nonproduction systems with which to
prototype the solution. It is generally a good idea to have such systems
available for testing in any case. It may also be possible to borrow systems to
test with. In any case, it’s probably worth asking your vendors for such loans.
Whether or not test equipment is available, you
should research what testing third-party vendors or third-party organizations
have already done. In this way, you will avoid duplicating their efforts. If
you cannot get representative test equipment for an element that needs to be
prototyped, it might be acceptableand necessaryto rely entirely upon the work
done by others to validate the solution.
Again, with many solutions, this is a perfectly
acceptable way to go. If you do not feel that inhouse testing is warranted,
then you can save time and money by skipping the prototype and test phase. Just
make sure that you have documentation certifying the solution before you make
this decision.
How and When Are Backups to Be Done?
You need to get a list of everything that relates to
the system’s backups:
will be used?
will be used for each host?
be backed up by which tape libraries?
occur?
be backed up?
How do they work?
used? How do they work?
What Will Be the Traffic Patterns in the Solution?
You should produce a matrix showing every
initiator-to-target communication expected in the SAN. This is necessary to
determine performance characteristics, and to set up zoning on the fabric:
specific storage array?
cluster will talk directly to each other over the SAN?
will be performing serverless backups?
be backing up?
Create a table listing every device on the SAN that
can act as an initiator in one column. This will include every host, every
storage virtualization product, and every serverless backup server. It might
include storage arrays, if they have data replication capabilities. Then put a
second column next to it with all of the targets that each initiator will
communicate with (Table 5.1).
Table 5.1
Initiator-to-Target Mapping
SAN Traffic |
|
Initiators |
Targets |
host1 |
array3 |
host2 |
array1 array2 tape1 |
host3 |
array1 |
host4 |
array1 array2 |
tape1 |
array1 array3 |
array3 |
array4 |
array3 |
array4 |
Note: that some devices on a SAN can act as both an
initiator and a target. If so, they will appear in both columns. See array3 and
array4 in Table 5.1. This is how you would indicate that array3 and array4
perform data replication between them.
You will not necessarily be able to build this table
by interviewing one person; it will likely be developed over the course of the
interview process, changed as the implementation takes place, and maintained
for the life of the SAN.
What Do We Know about Current Performance
Characteristics?
Any devices that currently exist, and will be
transitioned onto the SAN, are candidates for empirical performance testing.
Create a second set of columns next to the traffic
pattern columns, as shown in Table 5.2. You will need entries for peak
utilization and sustained utilization. Obviously, you will only be able to
enter current data for initiators that already exist, and already communicate
with the same targets they will talk to after the SAN is complete.
Table 5.2
Current Traffic
SAN Traffic Patterns |
Current Peak |
Current Sustain |
|
Initiators |
Targets |
MB/sec |
MB/sec |
host1 |
array3 |
10 |
5 |
host2 |
array1 array2 tape1 |
|
|
host3 |
array1 |
50 |
10 |
host4 |
array1 array2 |
|
|
tape1 |
array1 array3 |
|
|
array3 |
array4 |
|
|
array4 |
array3 |
|
|
In this example, host1 and host3 already exist, and
are already talking to array3 and array1, respectively. All of the other
devices are to be added, are not talking to the same targets that they will be
after the SAN is up, or performance data might simply be unavailable.
If the owner of a system has already done this kind
of analysis, you will simply transfer the numbers to your table. If not, you
should work with the owner to get the performance information, as this might
have a substantial impact on your SAN design.
Authors
Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.
By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo
Gathering Performance Data
On almost any kind of system, some facility exists for
measuring performance. More often than not, there will be multiple options for
gathering disk I/O performance information.
For example, on a Windows NT system, you might use
the diskmon feature. You have to install this from the Windows NT Resource Kit.
If you do not install diskmon, standard Windows perfmon will not have a disk
monitoring tool. Alternately, you could install a package like Intel’s Iometer,
and use that to generate a simulated load and measure performance. This tool is
presently available as a free download from Intel’s Web site.
Under Sun’s Solaris operating system, performance can
be measured using the iostat utility, the GUI utility perfmeter, or one of a
number of third-party utilities like Extreme SCSI. There are similar tools in
every UNIX variant. We are providing examples for Solaris only, since the
details of these commands will vary between every flavor of UNIX, and providing
examples for every variant is impractical. Refer to the man pages for your particular
version of UNIX for the exact syntax. There are also a number of options for
generating loads under Solaris, ranging from the dd command, toagaina utility
like Extreme SCSI.
Note: Tools like Iometer, dd, and Extreme SCSI should be
used with care. It is tempting to use them to generate maximum load. A more
useful test to run is to generate a representative load. Try to determine what
your application will actually be doing in terms of read/write ratio, and total
bandwidth consumption, and use these tools to generate that kind of load on the
system.
In cases where performance data cannot be collected
empiricallysuch as when the system in question does not exist yetthere is
still hope. Most hosts are not capable of generating sustained load at full
wire speed. They are generally going to be limited by other factors. These
could include:
overhead than the TCP/IP stack, it still takes a fast processor to get near to
full performance on a 1 Gbit/sec Fibre Channel link, simply because the
processor will be busy running whatever task is actually generating the
I/O. While almost all hosts now shipping have sufficiently fast CPUs, you
also need to estimate how much of that CPU resource is taken up by other tasks
the host is performing that do not result in disk I/O (such as running a TCP/IP
stack). Moreover, many data centers have older CPU servers that might not be
capable of running at 1 Gbit/sec even without taking these tasks into
consideration.
32-bit 33 MHz PCI bus can only sustain about 120 MB/sec. A 64-bit 33 MHz or
32-bit 66 MHz PCI bus can handle about 240 MB/sec, and a 64-bit 66 MHz bus can
handle about 480 MB/sec. Even on the higher rate buses, you must bear in mind
that it is a shared bus. If you put two Fibre Channel HBAs onto a bus that can
handle 240 MB/sec, that will be the total possible full-duplex speed for both
HBAs. Therefore, you would on average get 120 MB/sec out of each interface. For
example, this couldin a balanced read/write environmentmean that you get only
60 MB/sec of read performance out of each card. Also bear in mind that there
may be other cards on the bus taking up some of that bandwidth.
SAN, many HBAs cannot achieve or at least cannot sustain full 1 Gbit/sec
transfers. Newer HBAs typically have better performance. Older HBAs might only
be able to achieve 60 MB/sec, regardless of the other possible issues.
sustain 100 MB/sec per interface on all interfaces simultaneously. Some barely
operate at 30 MB/sec per interface, which is more than acceptable for many
applications! Finding out the limits of your RAID array should be as simple as
calling the vendor’s support channel. Of course, you might also check
third-party testing results such as those done by many industry magazines for
an unbiased opinion.
it might spend a lot of time paging. If it does, performance will be
substantially degraded.
I/O, the disk heads will have to jump all over the platform. Since disk seek
time is an order of magnitude or more slower than a Fibre Channel link, you
might have to allocate substantially less bandwidth for random I/O applications
like a file server than for sequential I/O applications like a video server or
decision support system.
overhead This ties into the CPU-limit
factor. How much CPU do you have, and how much of it is free for handling I/O?
device Most tape drives cannot come
anywhere near 100 MB/sec. It is usually sufficient to ask a vendor for
performance data in the case of tape drives, although optimistic compression
ratios can inflate the performance numbers they provide.
In addition, if anything is known about the
application that is running on the host, you might be able to make a good guess
about how much load it will even try to place on the disk subsystem. For
example, if you know that the host is an intranet Web server, and that it
receives only 500 hits a day, you can safely guess that its I/O requirements
will be minimal.
Once you have collected your best empirical or estimated
numbers for each factor, use the lowest common denominator approach to estimate
the maximum bandwidth that the system could need. You can guarantee that the
overall system will not outperform its weakest link.
Also note that on systems with multiple HBAs, I/O
load might be distributed across these HBAs. Achieving active-active
distribution across HBAs might require third-party applications like the
VERITAS Dynamic Multipathing software, Troika’s HBA driver, or one of the
storage vendor’s dual-path products. If this is the case, you might estimate
that each HBA will usually have a fraction of the total load. In a dual-fabric,
active/active HBA architecture, each HBA normally has 50 percent of the total
load. If a system is capable of sustaining 70 MB/sec, then each HBA will
sustain 35 MB/sec. Note that this might change during system maintenance if you
shut down one path, and the remaining path could then take on the full 70
MB/sec, so the design should incorporate the worst-case scenario. It is usually
also good practice to add some padding to the top of this estimate (perhaps 10
percent) to allow for the unexpected.
Note: Unlike physical-disk counter data, logical-disk
counter data is not collected by the NT operating system by default. To obtain
performance counter data for logical drives or storage volumes, you must type
diskperf -yv at the command prompt. This will cause the disk performance
statistics driver used for collecting disk performance data to report data for
logical drives or storage volumes. By default, the NT operating system uses the
diskperf -yd command to obtain only physical drive data. For more information
about using the diskperf command, type diskperf -? at the command prompt.
What Do We Know about Future Performance
Characteristics?
Performance numbers change over time. Consider a
customer database for a catalog retail company. Perhaps you will install the
SAN in February, because this is your slow month of the year, and you can get
the necessary downtime. You might know that the database host will start
talking to its storage array(s) at a sustained rate of 5 MB/sec during the
business day, with a peak of only 10 MB/sec. However, when the Christmas season
comes along and your business picks up, you might move to a 50 MB/sec sustained
rate, peaking at 70 MB/sec. Because of the potential for substantial changes in
performance requirements over time, it is essential to plan for both current
and projected performance. Most of this might be educated guesswork, since many
of the systems you are going to deploy might not yet exist.
Again, you will need to come up with numbers for both
sustained traffic and peak traffic for each communication. Also try to
determine what days/times peak performance will occur. This will be added to
your table (Table 5.3).
Table 5.3
Adding Traffic Projections
SAN Traffic Performance |
SAN Peak Peak Times |
SAN Sustained Patterns |
Performance |
||||
Initiators |
Targets |
Initial |
Expected |
Initial |
Expected |
Initial |
Expected |
host1 |
array3 |
10 |
10 |
5 |
5 |
M.F |
same |
|
|
|
|
|
|
8a-5p |
|
host2 |
array1 array2 tape1 |
0 0 20 |
70 70 20 |
0 0 0 |
50 50 0 |
|
|
host3 |
array1 |
50 |
50 |
10 |
20 |
M.F 8a-5p |
+ Sa 10a-4p |
host4 |
array1 array2 |
0 0 |
90 90 |
0 0 |
50 50 |
|
|
tape1 |
array1 array3 |
0 0 |
20 20 |
0 0 |
0 0 |
Sa 5p-9p Sa 9p-11p |
Same same |
array3 |
array4 |
10 |
30 |
5 |
5 |
|
|
array4 |
array3 |
5 |
5 |
0 |
0 |
|
|
Again, you can only enter data for systems about
which you can make an educated guess. If you know about what the peak traffic
could be based only on the limitations of a system, you might not have any way
of guessing when this would occur. You should also enter projected data for systems
that you know that you will add later.
In Table 5.3, host2 and the application it is running
might not exist yet, so every piece of data about that system is pure
guesswork. Let us say that host2 is a Return Merchandise Authorization (RMA)
system, and your rapidly growing company has never had an RMA system before.
You might not be able to reliably guess when customers are going to call in
with RMA requests most often, or even how many RMAs you are going to get in a
given day. The best you can do is determine what performance the hardware and
software you are installing could reasonably run at, and design the SAN to
support it all the time it could be in use. While this approach might result in
over-engineering your network, this is better than the alternative. During
future design phases, you can alter the SAN design to adjust or scale back the
design accordingly, as well as incorporate other additions and changes.
For backup devices, peak usage will always correspond
with your backup schedule. This will usually not correspond with peak usage of
the rest of the system. This is particularly useful knowledge when planning an
ISL architecture, because you can often count on having low nonbackup-related
utilization of ISLs during backup windows. An obvious exception to this is a
SAN that is used solely for performing LAN-free backups.
Authors
Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.
By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo
How Much Downtime Is Acceptable to Production
Components During Implementation?
It will likely be necessary to shut down some
existing production devices during implementation, to ensure a safe transition
onto the SAN. For example, you might have to shut down a host to install an
HBA. Determine how much downtime is acceptable for each host, and at what times
this can occur. Generally, you should try to schedule more downtime than you
think you need to ensure that any unforeseen issues that arise during the
implementation can be handled within the downtime window.
How Much Downtime Is Acceptable for Routine
Maintenance? How Much Downtime Is Acceptable for Upgrades and Architectural
Changes?
These two questions are intimately related,
becauseto an end userthere is really no difference between downtime to a
production system for maintenance, and downtime for an upgrade. Once systems
are in production, you will want to keep them running as much as possible.
Many upgrades can be accomplished with zero downtime
by using a double- or triple-redundant fabric architecture. No matter how well
you plan the upgrade and maintenance processes beforehand, you will need to
shut down specific hosts on occasion. For example, you might want to upgrade an
HBA driver, which would typically require a reboot.
Note: Wherever possible, a redundant fabric architecture
should be used. This will ensure the best performance and reliability, and will
simplify maintenance tasks. In a redundant fabric architecture, every host has
at least two paths to every storage device it connects to, and these paths
traverse two completely unconnected fabrics. While it might appear on the surface
to be more expensive, if hosts are to be dual-attached anyway, it is actually
less expensive to attach them to two separate fabrics than to use one larger
fabric, or a director-class switch. This does not even include the downtime ROI
calculation, which, in high-availability environments, will usually overshadow
the entire cost of the SAN. More details about redundant and resilient fabrics
are provided in Chapter 7.
You should therefore determine in advance when you
will be able to schedule downtime for every host and storage array, and for the
fabric itself. You might not have to use every scheduled outage, but having
them available to you when you do need them is essential.
One way to do this is to make a list of applications
and services provided by the hosts on the SAN, and determine an owner for each.
Take your list of SAN devices and map these devices to the applications and
services they affect. This will provide a mapping of application/service
owners, who are typically responsible for scheduling downtime, to devices that
typically require downtime. Have each owner approve the downtime calendar for
each device that affects his or her service.
The mapping of owners to devices should be kept up to
date as changes in personnel, applications, and/or SAN infrastructure occur.
When Do You Need Each Piece of the Solution to Be
Complete?
Once you have a table detailing which of the
initiators communicate with which targets, you can begin to create a timeline
for the project. Other members of the core team will tell you something like,
“the customer database application must be online by mid-June.” It is your task
to define which SAN components you need to accomplish this, and to develop a
timeline for adding these components that meet their requirements.
This is a high-level list of some of the questions
that should appear on a SAN design interview form:
problem are you trying to solve?
requirements of the solution?
nodes that will attach to the SAN?
application do you have in mind?
solution already exist?
already in production?
solution need to be prototyped and tested?
available for testing?
backups to be done?
patterns in the solution be?
current performance characteristics?
future performance characteristics?
acceptable to production components during implementation?
acceptable for routine maintenance?
acceptable for upgrades and architectural changes?
piece of the solution to be complete?
Conduct a Physical Assessment
You should now have the location of every piece of
hardware that currently exists. In addition, you should know where each piece
of hardware in the eventual SAN will be located.
Look at each piece of hardware. Make sure that it
does exist, and has all necessary pieces to function. This could include things
like power cords, keyboard, mouse, monitor, Ethernet card, Ethernet cable,
HBAs, and Fibre Channel cables. Note the physical dimensions of the hardware,
and its power/cooling requirements. Does it rack mount? Does it have a network
interface? How many Fibre Channel interfaces does it have? How much does it
weigh? You should already have this information from the interview process, but
you should verify that the information you were given is correct.
Go to each location where SAN equipment or nodes will
be installed, and again check to see that your information was correct. Notice
how the equipment will fit into the space available. Notice how the equipment
will enter the building. You should also have a meeting with the person in
charge of the facility to discuss power, cooling, and equipment locations.
Authors
Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.
By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo
Analyzing the Collected Data
Now that you have collected information from all key
stakeholders in the project, and verified the accuracy of this information, you
will analyze it to determine the characteristics of the required solution. When
you have completed this process, you will have a list of technical
requirements, and an ROI analysis to justify the project.
Processing What You Have Collected
You have a matrix detailing communication between
nodes. Attempt to group the nodes by communication patterns. The purpose of
this is to determine the amount of known locality in the SAN. Locality of
reference is a concept prevalent in many areas of computer science, from disk
drive construction to LAN design. Locality is important in SAN design because
if you can localize traffic into specific areas of a SAN, you directly improve
the SAN’s performance and reliability. This will allow a more cost-effective
SAN design as well, preventing over-designing the network to handle nonexistent
cross traffic. Locality is discussed in greater detail in Chapter 7.
A SAN with a great deal of known locality might be
constructed out of many separate fabrics, with no ISLs whatsoever. A SAN with
little or no known locality might require a high-performance ISL architecture
(Table 5.4).
Table 5.4
Initiatorto-Target Mapping for Locality Example
SAN Traffic Patterns |
|
Initiators |
Targets |
host1 |
array3 |
host2 |
array1 array2 tape1 |
host3 |
array1 |
host4 |
array1 array2 |
tape1 |
array1 array3 |
array3 |
array4 |
array4 |
array3 |
In Table 5.4, array3 would be grouped with host1,
tape1, and array4. None of those devices will need to communicate with any of
the other devices. They could be grouped onto a single switch, or even put onto
a totally separate fabric. You might find it helpful to do the grouping in a diagram.
For another example, look at Figure 5.2.
Figure 5.2 SAN
Diagram without Grouping
Nothing is known about the communication patterns in
this SAN. Consequently, there is no way to optimize ISLs for performance. After
grouping the initiators with their targets, the SAN diagram could look
something like Figure 5.3. If you look carefully, you will notice that there
are only 12 connections into this SAN. If there are fewer connections than
there are ports in your switches, you do not really need to go through the
grouping exercise because localization of traffic will happen automatically. It
is only useful if you will be using ISLs; however, as most systems scale well
past the size of the largest switches available, it will be a frequent
exercise. For the purposes of making the examples more readable, we will just
assume that they are all dealing with a subset of the devices that the SAN will
support.
Figure 5.3 SAN
Diagram with Simple Grouping
Making a diagram such as this will allow you to see
at a glance what the communication patterns for your SAN are.
This example is simplistic, and in large SANs, there
will likely be conflicts. When you cannot effectively group all of the
communication patterns, you should focus on grouping faster performing devices.
For example, if you find that the bulk of traffic will be between host1,
array3, and array4, these could be grouped separately from tape1 and host2 if
necessary. This could happen if you find that there are so many
interrelationships that you end up with very many devices, but very few very
large groups. The grouping technique does not help for performance if you only
have one big group. It could also happen if you have a few devices that are
shared by a great many devices, such as a large RAID array in a storage
consolidation solution.
Another way to combat this “group growth” problem is
to account for multiple interfaces on storage arrays. Let us say that you have
a redundant fabric architecture. Your RAID array has eight interfaces, and each
host will access only two of themone interface on each fabric. List each
interface on the array separately in your traffic pattern table. Then, you
associate servers or groups of servers with specific interfaces. With the array
listed as a single entity, a diagram of the communication could look something
like Figure 5.4.
Figure 5.4 SAN
Grouping Diagram with Single-Entity Arrays
If, however, you separate the interfaces, your
diagram could look more like Figure 5.5.
Figure 5.5 SAN
Grouping Diagram with Separated Interfaces
You can indicate that a device crosses groups but
does not need much in the way of performance by varying the line color, weight,
or pattern. Figure 5.6 shows that the tape robot crosses all groups, but does not
need much bandwidth.
Figure 5.6 SAN
Grouping Diagram with Tape Robot Addition
If you are able to make relatively small performance
groups, your SAN will benefit greatly from applying the principal of locality.
For now, you simply need to be able to determine the category of architecture
you will require: one that has lots of known locality (has well-defined
performance groups), or one that does not. This will affect how many switch
ports you need to allot for ISLs. If traffic is localized within an area of the
SAN, it will obviously not need to make use of ISLs leaving that area. In this
case, you will be able to get superior performance even with far fewer ISLs,
resulting in more ports available for servers and storage.
Authors
Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.
By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo
Establishing Port Requirements
Now you will determine how many switch ports you will
need to purchase. (This is a general estimate for calculating ROI; it might be
a bit more or less than your final estimate.)
Take the ports you found out about during the
interview process. Make sure that you account for all ports on each node. Some
RAID arrays have many ports, and many hosts have at least two HBAs. Add up
these ports to get the total number of exposed ports your SAN will require. You
will then divide this by the number of different fabrics you will be using. If
you have dual-redundant fabrics, you will divide by two. If you have
triple-redundant fabrics, divide by three, and so on. This will give you the
number of required exposed ports per fabric. The number of “overhead” ports you
must allocate for ISLs and for unused ports will depend on several factors:
required ports per fabric.
locality.
switches as a single entity.
your SANany MAN/WAN connections, or intra-building campus connections, or
intra-floor building connectionsmight dictate use of additional ISLs and less
than perfect utilization of the ports on each switch.
expected performance characteristics.
in port count of the fabric.
policies regarding port usages on network devices. For example, you might
require that a certain number of ports be left available for expansion or
troubleshooting during the course of normal operation.
Simple Case
If the number of required exposed ports is less than
the number of ports on a single switch, you will generally need zero ports for
ISLs. In this case, you will require one switch per fabric. However, as larger
switches utilize more hardware internally to connect the higher number of user
ports, a decision might need to be made between using a larger switch versus
utilizing a network of smaller ones. The appropriate decision will depend on
performance requirements, budget, and design factors. In addition, if you have
made small performance groups that have no components in common, you might be
able to localize traffic 100 percent, and require no ISLs. You would have many
small, unconnected SAN islands if you follow this approach. One reason not to
use isolated islands is that requirements change. Someday you might need access
between islands at a moment’s notice. A robust architecture can achieve your
immediate connectivity requirements, and give you the flexibility to handle
change as well.
You will require each fabric to be a network if this
is not the case, or if you wish to design in flexibility to your configuration.
You will have to reserve port count for these. Simple case requirements include
the following:
than exist on a single switch, or
is well defined and smaller than the number of ports on a single switch.
growth and change are minimal.
Assume that you have two 16-port arrays (32 storage
ports total), 10 dual-HBA servers (20 ports), and two single-port tape
libraries (two ports). Your total port count is 54. However, assume further
that you are using a dual-redundant SAN architecture. Your port count per
fabric is 27. You are building the fabric out of 16-port switches. It is
possible that some ISLs are required. You will need to determine how many are
needed.
Variant A
With a relatively small fabric like this and
relatively high locality, you can assume that you will have about 14 free ports
per switch. Two switches with two ISLs between them will yield 28 ports per
fabric. You are using a dual-redundant architecture, so there will be two
fabrics, for a total of four switches. Your grouping diagram will look like Figure
5.7.
Figure 5.7 Determining ISL Requirements for Variant A
This grouping would result in an actual
implementation resembling Figure 5.8.
Figure 5.8 Variant A Implementation
Variant B
If you decide that you cannot guarantee the
localization of traffic for some reason, grouping will not help. Assuming also
that you have a requirement for high performance between the switches, you
would add two ISLs per switch to the estimate, for a total of about four ISLs
per switch. Your architecture might look Figure 5.9.
Figure 5.9 Adding ISLs for High Performance in Variant B
The same technique can be applied to any SAN, no
matter how complex. In fact, the larger the SAN, the greater the benefits will
be from grouping traffic.
Moderate Case
If the required exposed port count is about double or
triple the per-switch port count, and some locality is known, you will be able
to use very few ISLs. In this case, estimate two ISLs per switch. Let us say
that you need 26 ports, and you are using 16-port switches. Two ISLs per switch
means that you actually get 14 ports per switch. Two switches will give you 28
ports, so you would budget for two switches per fabric, or four switches total.
Moderate case requirements include the following:
times as many ports are required than are present on a single switch.
reasonably well defined. Some locality is known.
growth and change are minimal.
Note: The low port count/high locality/low ISL count
configurations work well for either two or three switches. Two switches would
be cascaded together with two ISLs, with 16-port switches yielding 28 ports.
Three switches would be connected in a ring, supporting about 40 devices. If you
are over that limit, a four-switch full mesh can support about 50 devices. The
full-mesh architecture does not scale well beyond that point, and none of these
work well if you have performance groups with more than 13 or 14 members. It is
feasible to build ring or partial-mesh topology fabrics with higher port
counts, but it is generally better to use a core/edge topology for higher port
count solutions. These topologies are explained in detail in Chapter 7.
>
Authors
Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.
By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo
Complex Case
If you need more ports than one of these
configurations will handle, you will need to allocate about four ISLs per
switch. You might use fewer than four ISLs on some switches, and perhaps
nothing but ISLs will be present on other switches. In the complex case for
port count estimates, the intent is to average the ISL requirements.
Until a detailed architecture is developed, you will
have to make general estimates for a few things. If you have any distance
requirements, add two ISLs per switch. If you have very high-performance
requirements, and very little known locality, add two ISLs per switch.
Take the estimated number of ISLs per switch (I) and
subtract it from the number of ports per switch (PS). Divide the total required ports per fabric (P) by
this number and round up. This is the estimated number of switches (S) that you
need to budget for. For estimating complex SAN switch counts, S=P/(PS I).
For example, if you have a need for 30 ports per
fabric (P=30), are using 16-port switches (P
S=16), and each switch will use about two ISLs (I=2),
then the number of switches you estimate needing per fabric is 30/(162). This
is 2.14, which rounds up to 3. If you have a single fabric, this is the number
of switches you should budget for. If you have a dual-fabric SAN, you should
budget for six switches. Complex case requirements include the following:
ports might be required.
might or might not be defined.
growth and change are significant.
Preparing an ROI Analysis
In any business transaction, it is important to
understand the economic benefits or the Return On Investment (ROI) that your
company will receive. Preparing an ROI analysis for your SAN project will show
how your company will not only return the capital investment, but also save
additional money as well in time, management, and other efficiencies.
During the interview process, you made a list of all
of the equipment that you would need to purchase. To begin the ROI analysis of
your SAN, determine which components are specific to the SAN project. For
example, if your company will need to buy additional storage arrays whether or
not a SAN is used, these would not be included on the expense side of the
analysis. If the SAN is expected to prevent you from having to buy an array,
this cost savings would go onto the benefit side of the analysis. You should
include any hardware you intend to buy for testing that will not be used
elsewhere.
When accounting for staff time spent on the project,
make sure that you only charge the project for time spent beyond what would be
spent by not building the SAN. If you are expected to save staff time in the
long run, apply this to the benefit side. Your ROI analysis will be a living
document, and will be updated as the SAN project develops.
The Return On Investment Proposition
Technical justifications for SAN infrastructure
deployments can often be made more credible by adding an ROI analysis for the
proposed implementation. Follow the guide in the following sections to produce
an ROI analysis based on SAN solutions to particular problems.
Step One: Pick a Theme or Scenario
Most implementations have a purpose. That purpose
could be a server or storage consolidation to improve infrastructure usage and
gain economies of scale, ensuring storage and server resources are utilized in
the most cost-effective manner. High-availability clustering can improve the
availability of mission-critical applications, thus ensuring business
continuance and the cost saving associated with it. SAN-based backup deployments
improve data integrity by performing backups and restores more efficiently and
quickly, again saving in business continuance time and effort.
Step Two: Identify the Affected Infrastructure
Components
Most SAN deployments will focus on affected servers.
Servers can be grouped according to the applications they run or the functional
areas they support. Examples of application groupings include Web servers, file
and print servers, messaging servers, database servers, and application
servers. Functional support servers might include financial and personnel
systems or engineering applications. Once the server groups are known, get the
characteristics of servers in each group. For example, if your solution fits
into a storage consolidation theme, you should consider factors such as:
storage
for growth (headroom)
requirements
associated downtime cost
software costs
effort required to keep the servers up and running
Step Three: Identify the SAN-Enabled Benefits
The scenario approach allows you to focus more
closely on the benefits. Server and storage consolidation, for example, will
concentrate on benefits accrued from more efficient use of server and storage
resources, improved staff productivity, lower platform costs, and better use of
the infrastructure. Simply take the list of characteristics you developed in
step two, and show how a SAN can provide benefits in those areas. Establishing
specific cost savings is one of the two key elements in the ROI process, so be
sure to look hard for every area of benefit.
Step Four: Identify the SAN-Related Costs
Determining the costs associated with the scenario
involves identifying the new components specifically required to build and
maintain the SAN. These can include software licenses, switches, Fibre Channel
HBAs, optical cables, and any service costs associated with the deployment. Be
careful to include only those items that relate directly to the SAN
implementation. This is the second key element in the ROI process: if you do
not correctly estimate expenses, the ROI might be substantially better or worse
than your estimate.
Step Five: Calculate the ROI
There are several standard ROI calculations in common
use, such as net present value (in dollars), internal rate of return (as a
percentage), and payback period (in months). Briefly, these can be defined as:
investments where the net present value of all cash flows is calculated using a
given discount rate.
present value of the future cash flows of an investment equal the costs of the
investment.
of a capital investment on a nondiscount basis.
Detailed explanations of these techniques and how to
use them can be found in most accounting textbooks. It is likely that your
company has a preferred method for calculating ROI. You should determine which
method this is, and if there are standard forms for presenting your analysis.
Asking your accounting department might be a good first step.
This approach to calculating ROI allows you to focus
on a particular project or infrastructure-based problem. It allows you to
reduce deployment risk by deploying SANs in phases by scenario. Deploying by
scenario will keep investments limited to the solution at hand and create an
investment base for future deployments. The initial investment will improve the
ROI on other scenarios by reducing some of the investment required to deploy
them.
The Rest of the Process and the Repetition of the
Cycle
Now you have the following documents:
the interview process, which define what the SAN project needs to accomplish.
This includes:
–A technical requirements document
–A timeline for accomplishing the tasks associated
with implementing the SAN
–A list of everything that you will need to buy to make
the project work
SAN will be designed.
justify continuing with the project.
These will be used and maintained throughout the life
of the SAN. The timeline will be the framework in which all activities in the SAN’s
lifecycle will reside. In later chapters, you enter the architecture
development phase and will use these documents to develop a detailed
architecture for your SAN. This will in turn be used to develop a test plan.
These documents will be used in the approval process for implementation, and
will be kept up to date during the maintenance phase as part of the SAN’s
documentation set. If any major changes to the SAN are needed, the lifecycle
will be repeated and another set of documentation will be produced.
Authors
Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.
By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo
Summary
The SAN design process consists of seven phases,
which are cycled through as needed throughout the life of your SAN. Data
collection and analysis together define the requirements of your SAN. These
requirements feed into the architecture development process to produce a SAN
design blueprint. After you have a plan in place for your SAN, you must test
certain components to ensure that it is working the way you thought it would,
before you can begin to transition and release it into production. Once the SAN
has entered production, it falls into an ongoing maintenance phase, and
continues in that phase until a change occurs that causes the cycle to repeat.
The first two phases (data collection and analysis)
are critical to the health of the SAN. Simply put, if the information on which
the design is based is incomplete and/or inaccurate, the design will be
incorrect.
Data collection consists of a series of interviews,
collecting the answers into a meaningful format (a technical requirements
document), and verifying the accuracy of the collected data. It is imperative
that all key stakeholders in the SAN project be included on the interview list.
While listed as a separate phase, data analysis
actually coincides with data collection. The objective of the analysis phase is
to turn the raw data, which is generally in the form of business requirements,
into a more technical formatthe technical requirements document. Some of this
occurs “on the fly” during the interview process. However, certain tasks are done
after the interviews are complete. For example, detailed port count and
performance requirements are generated “on the fly,” and an ROI proposition is
created after the fact. Once the requirements of the SAN are well defined, the
remaining phases can take place. These phases are covered in subsequent
chapters.
Solutions Fast Track
Looking at the Overall Lifecycle of a SAN
q
The SAN design process
is a cycle.
q
This process consists
of seven phases:
1. Data
Collection
2. Data
Analysis
3. Architecture
Development
4. Prototype
and Test
5. Transition
6. Release to
Production
7. Maintenance
q
Whenever there is a
fundamental change to the SAN, the cycle should repeat.
Conducting Data Collection
q
Data collection is the
foundation on which a SAN is built.
q
You should interview
everybody who has an interest in the project.
q
During the interview
process, create a technical requirements document.
Analyzing the Collected Data
q
There are several
things that you need to get out of data analysis:
The number of different
fabrics that will make up the SAN solution
The port count and
performance characteristics of each fabric
An estimate of the hardware
required to meet these requirements
q
You might be able to
localize traffic for better performance if you can create well-defined groups.
q
Prepare an ROI
proposition to justify your SAN project.
Frequently Asked Questions
Q: Once I have designed my
SAN, shouldn’t it be done? I don’t want to have to keep reinventing the wheel!
A: Yes and no. After a SAN
enters production, it is “done” until you want to change it in a fundamental
way. As long as you are happy with leaving your SAN the way it is, there is no
reason why you would have to repeat the design cycle. Simply adding a new
storage array does not require a repetition of the cycle. Moreover, events that
do cause the cycle repeat might cause it to repeat relatively quickly. For
example, if you decide to go through the design process because you are adding
a new type of storage array to the SAN, and want to validate that doing so
won’t break anything, you will be able to take a fast track through most of the
process. After all, adding this device will not by any stretch of the
imagination require that you change your fabric topology, or affect much of
your SAN architecture.
Q: Every end user in my
company is a stakeholder in the SAN. Do I need to interview everybody?
A: It is true that
everybody who uses a system is a stakeholder in that system. However, we mean
something a little less broad. When we refer to a stakeholder, we mean somebody
whose job revolves around taking care of one or more of the systems that will
attach to the SAN. This can include systems, database, and storage
administrators, as well as other technical people. It can also include people
responsible for the data that resides on these systems. For example, a manager
responsible for a call center at a phone-in catalog company might be a key
stakeholder in the SAN, because he or she is responsible for the data entered
into that company’s business systemwhich is attached to the SAN. Why is this
person a key stakeholder? Because he or she might have something to say about
the availability and performance requirements of the system. When in doubt, try
to include anybody on the team who wants to be there. It is usually better to
have more data than you need, rather than less.
Q: Do I need to wait until
data collection is complete before beginning data analysis?
A: Actually, the data
collection and analysis phases are most effective if there is some degree of overlap.
If you have analyzed data from the first interview when you go into the second,
you will be able to better understand the answers, and might also be able to
direct the line of questioning along more useful lines. Be careful not to
develop firm convictions too early on, though. Always approach SAN design
scientifically. Never start an interview with a firm preconception of the
outcome! Collection and analysis are divided into two phases because some of
the analysis naturally occurs after all data collection is complete. For
example, you can’t prepare an ROI proposition until you have a fairly complete
picture of what the SAN will need to accomplish, and some idea of the technical
infrastructure that will be involved.
Authors
Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.