Book Excerpt: Building SANs with Brocade Fabric Switches Page 5 - EnterpriseStorageForum.com

Book Excerpt: Building SANs with Brocade Fabric Switches Page 5


By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo

Gathering Performance Data

On almost any kind of system, some facility exists for measuring performance. More often than not, there will be multiple options for gathering disk I/O performance information.

For example, on a Windows NT system, you might use the diskmon feature. You have to install this from the Windows NT Resource Kit. If you do not install diskmon, standard Windows perfmon will not have a disk monitoring tool. Alternately, you could install a package like Intel's Iometer, and use that to generate a simulated load and measure performance. This tool is presently available as a free download from Intel's Web site.

Under Sun's Solaris operating system, performance can be measured using the iostat utility, the GUI utility perfmeter, or one of a number of third-party utilities like Extreme SCSI. There are similar tools in every UNIX variant. We are providing examples for Solaris only, since the details of these commands will vary between every flavor of UNIX, and providing examples for every variant is impractical. Refer to the man pages for your particular version of UNIX for the exact syntax. There are also a number of options for generating loads under Solaris, ranging from the dd command, toagaina utility like Extreme SCSI.

Note: Tools like Iometer, dd, and Extreme SCSI should be used with care. It is tempting to use them to generate maximum load. A more useful test to run is to generate a representative load. Try to determine what your application will actually be doing in terms of read/write ratio, and total bandwidth consumption, and use these tools to generate that kind of load on the system.

In cases where performance data cannot be collected empiricallysuch as when the system in question does not exist yetthere is still hope. Most hosts are not capable of generating sustained load at full wire speed. They are generally going to be limited by other factors. These could include:

  • CPU speed Although Fibre Channel has much lower overhead than the TCP/IP stack, it still takes a fast processor to get near to full performance on a 1 Gbit/sec Fibre Channel link, simply because the processor will be busy running whatever task is actually generating the I/O. While almost all hosts now shipping have sufficiently fast CPUs, you also need to estimate how much of that CPU resource is taken up by other tasks the host is performing that do not result in disk I/O (such as running a TCP/IP stack). Moreover, many data centers have older CPU servers that might not be capable of running at 1 Gbit/sec even without taking these tasks into consideration.


  • PCI bus speed Fibre Channel full duplex is 200 MB/sec. A 32-bit 33 MHz PCI bus can only sustain about 120 MB/sec. A 64-bit 33 MHz or 32-bit 66 MHz PCI bus can handle about 240 MB/sec, and a 64-bit 66 MHz bus can handle about 480 MB/sec. Even on the higher rate buses, you must bear in mind that it is a shared bus. If you put two Fibre Channel HBAs onto a bus that can handle 240 MB/sec, that will be the total possible full-duplex speed for both HBAs. Therefore, you would on average get 120 MB/sec out of each interface. For example, this couldin a balanced read/write environmentmean that you get only 60 MB/sec of read performance out of each card. Also bear in mind that there may be other cards on the bus taking up some of that bandwidth.


  • HBA speed Although designed to work on a 1 Gbit/sec SAN, many HBAs cannot achieve or at least cannot sustain full 1 Gbit/sec transfers. Newer HBAs typically have better performance. Older HBAs might only be able to achieve 60 MB/sec, regardless of the other possible issues.


  • RAID controller speed Many RAID controllers cannot sustain 100 MB/sec per interface on all interfaces simultaneously. Some barely operate at 30 MB/sec per interface, which is more than acceptable for many applications! Finding out the limits of your RAID array should be as simple as calling the vendor's support channel. Of course, you might also check third-party testing results such as those done by many industry magazines for an unbiased opinion.


  • RAM quantity and speed If your system is short on RAM, it might spend a lot of time paging. If it does, performance will be substantially degraded.


  • Disk seek time If your application does a lot of random I/O, the disk heads will have to jump all over the platform. Since disk seek time is an order of magnitude or more slower than a Fibre Channel link, you might have to allocate substantially less bandwidth for random I/O applications like a file server than for sequential I/O applications like a video server or decision support system.


  • Application overhead This ties into the CPU-limit factor. How much CPU do you have, and how much of it is free for handling I/O?


  • Write speed of tape device Most tape drives cannot come anywhere near 100 MB/sec. It is usually sufficient to ask a vendor for performance data in the case of tape drives, although optimistic compression ratios can inflate the performance numbers they provide.


  • In addition, if anything is known about the application that is running on the host, you might be able to make a good guess about how much load it will even try to place on the disk subsystem. For example, if you know that the host is an intranet Web server, and that it receives only 500 hits a day, you can safely guess that its I/O requirements will be minimal.

    Once you have collected your best empirical or estimated numbers for each factor, use the lowest common denominator approach to estimate the maximum bandwidth that the system could need. You can guarantee that the overall system will not outperform its weakest link.

    Also note that on systems with multiple HBAs, I/O load might be distributed across these HBAs. Achieving active-active distribution across HBAs might require third-party applications like the VERITAS Dynamic Multipathing software, Troika's HBA driver, or one of the storage vendor's dual-path products. If this is the case, you might estimate that each HBA will usually have a fraction of the total load. In a dual-fabric, active/active HBA architecture, each HBA normally has 50 percent of the total load. If a system is capable of sustaining 70 MB/sec, then each HBA will sustain 35 MB/sec. Note that this might change during system maintenance if you shut down one path, and the remaining path could then take on the full 70 MB/sec, so the design should incorporate the worst-case scenario. It is usually also good practice to add some padding to the top of this estimate (perhaps 10 percent) to allow for the unexpected.

    Note: Unlike physical-disk counter data, logical-disk counter data is not collected by the NT operating system by default. To obtain performance counter data for logical drives or storage volumes, you must type diskperf -yv at the command prompt. This will cause the disk performance statistics driver used for collecting disk performance data to report data for logical drives or storage volumes. By default, the NT operating system uses the diskperf -yd command to obtain only physical drive data. For more information about using the diskperf command, type diskperf -? at the command prompt.

    What Do We Know about Future Performance Characteristics?

    Performance numbers change over time. Consider a customer database for a catalog retail company. Perhaps you will install the SAN in February, because this is your slow month of the year, and you can get the necessary downtime. You might know that the database host will start talking to its storage array(s) at a sustained rate of 5 MB/sec during the business day, with a peak of only 10 MB/sec. However, when the Christmas season comes along and your business picks up, you might move to a 50 MB/sec sustained rate, peaking at 70 MB/sec. Because of the potential for substantial changes in performance requirements over time, it is essential to plan for both current and projected performance. Most of this might be educated guesswork, since many of the systems you are going to deploy might not yet exist.

    Again, you will need to come up with numbers for both sustained traffic and peak traffic for each communication. Also try to determine what days/times peak performance will occur. This will be added to your table (Table 5.3).

    Table 5.3 Adding Traffic Projections

    SAN Traffic Performance

    SAN Peak Peak Times

    SAN Sustained Patterns

    Performance

    Initiators

    Targets

    Initial

    Expected

    Initial

    Expected

    Initial

    Expected

    host1

    array3

    10

    10

    5

    5

    M.F

    same

     

     

     

     

     

     

    8a-5p

     

    host2

    array1

    array2

    tape1

    0

    0

    20

    70

    70

    20

    0

    0

    0

    50

    50

    0

     

     

    host3

    array1

    50

    50

    10

    20

    M.F

    8a-5p

    + Sa

    10a-4p

    host4

    array1

    array2

    0

    0

    90

    90

    0

    0

    50

    50

     

     

    tape1

    array1

    array3

    0

    0

    20

    20

    0

    0

    0

    0

    Sa

    5p-9p

    Sa 9p-11p

    Same

    same

    array3

    array4

    10

    30

    5

    5

     

     

    array4

    array3

    5

    5

    0

    0

     

     

    Again, you can only enter data for systems about which you can make an educated guess. If you know about what the peak traffic could be based only on the limitations of a system, you might not have any way of guessing when this would occur. You should also enter projected data for systems that you know that you will add later.

    In Table 5.3, host2 and the application it is running might not exist yet, so every piece of data about that system is pure guesswork. Let us say that host2 is a Return Merchandise Authorization (RMA) system, and your rapidly growing company has never had an RMA system before. You might not be able to reliably guess when customers are going to call in with RMA requests most often, or even how many RMAs you are going to get in a given day. The best you can do is determine what performance the hardware and software you are installing could reasonably run at, and design the SAN to support it all the time it could be in use. While this approach might result in over-engineering your network, this is better than the alternative. During future design phases, you can alter the SAN design to adjust or scale back the design accordingly, as well as incorporate other additions and changes.

    For backup devices, peak usage will always correspond with your backup schedule. This will usually not correspond with peak usage of the rest of the system. This is particularly useful knowledge when planning an ISL architecture, because you can often count on having low nonbackup-related utilization of ISLs during backup windows. An obvious exception to this is a SAN that is used solely for performing LAN-free backups.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors
    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    Page 5 of 10

    Previous Page
    1 2 3 4 5 6 7 8 9 10
    Next Page

    Comment and Contribute

     


    (Maximum characters: 1200). You have characters left.

     

     

    Storage Daily
    Don't miss an article. Subscribe to our newsletter below.

    Thanks for your registration, follow us on our social networks to keep up-to-date