Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Someone I work with was giving me a hard time about how I write about technology and always talk about how good things are, but I never address the reality of the situation: Often, the actual products we deal with on a daily basis do not work. He complained to me that he is sick and tired of hearing about all this great technology and how the future is bright across the industry, when as a program manager of a number of large integration projects, he sees very little actually working right out of the box. He is sick of hearing about how everything is peachy keen and challenged me to write about the real world that we both deal with daily.
As usual, I will not state what the hardware or software environments that set my friend off were but, needless to say, both were large storage environments that the hardware vendor proposed, so this was not a complex integration project and the hardware vendor should have known what to do and how to do it.
My friend has some really good points. Most people who write about technology are tech-savvy, but they are not actually trying to implement the technology. Marketing departments are responsible for making everything look wonderfully easy to use, but many times, especially for larger configurations, the software and hardware proposed just do not work as marketing and sales claim in the configuration sold.
I give you three key reasons for this:
- Lack of testing in the correct environment
- Poor communication to the field
1. Lack of Testing in the Correct Environment
Vendors do not have an unlimited amount of hardware, people and time to test things. There is cost pressure to testing both the hardware and people, and market pressure -- test for too long and your competitors will beat you to market. Therefore, vendors must make a determination of how much they can afford to test, and what they are going to test and not test. I have seen many tests plans and the coverage is often determined by the amount of time, people and hardware available for testing. There is also the issue that for new software technologies, new tests must be developed.https://o1.qnsr.com/log/p.gif?;n=203;c=204650394;s=9477;x=7936;f=201801171506010;u=j;z=TIMESTAMP;a=20392931;e=i
With myriad configuration options, how does development know what is the best set of options to test? Very often, the actual configurations being sold are not being tested. For example, take software running an older AMD or Intel processor, and the customer gets sold a new processor with a new PCIe bus, storage and everything. It is supposed to run with no issues, but if you have a complex storage application and new hardware, the likelihood is that the whole stack is not going to be tested, and there are going to be some timing issues found only on the new hardware. The new hardware is going to be allocated to paying customers rather than a testing environment. This can quickly become a problem.
2. Poor Communication to the Field
I have seen more times than I can count that development and testing, especially in big companies, just do not talk to the field people. Development and testing often have a set of hardware and software they work with that must be configured in a certain way to meet the requirements or sometimes even work at all.
My friend had a situation where the two configurations he was responsible for were not at all similar to the test hardware the vendor had in development or testing. Were the on-site people installing the systems aware of this situation? Of course not! Then, in both cases, development and level 3 support get involved. Simple things, like, are the systems connected to storage properly for failover; are the patches installed in the correct order; and are they even installed, require Level 3 support to meticulously go over the whole configuration.
In my experience, the bigger the company, the bigger the disconnect between the development organizations and the field. What was tested, how was it is configured, and what was proposed and sold requires significant work on the part of the field organization. This rarely happens, as sales is often off selling without development and support knowing how it will be figured. Right or wrong, sales and sales support know what the requirements are and what hardware is going to be available. They configure the system based on the requirements. If you need this many GB/sec and this many IOPS, you need xy and z hardware connected in this way. Once users find out that it does not work with the software, Level 3 support gets the 911 call begging for help, as the customer is angry that the system does not work as promised, and this is the first they hear of this.
This happens over and over again.
Overselling on the part of sales should also not be left out of the picture. The sales staff typically gets information from marketing, which gets information from development and testing, and they might state that the product can do x GB/sec and y IOPS. Often, people do not know the context for x and y. Under what conditions are x and y accomplished? Can you get x and y using the same configuration at the same time? Since each RAID controller can do both x and y, and your requirements are 3x and 3y, all I need is three controllers.
This does not take into account the real application. The performance results with x and y were achieved by benchmarking the system with specific configuration and a specific set of benchmark tests. This is not like real-world applications. The sales team is under pressure to get the best price for the customer because if they do not, another vendor will. So they configure the system based on the marketing specifications, and low and behold, the system cannot meet the performance requirements.
Support engineers are then air-dropped in to figure out the problem and get the system to meet the specifications. They realize it cannot be done. The vendor then sends more hardware, as that is the only way customer performance requirements can be met. This costs the sales team good will with the company, and in the long term, it might get the team fired.
What Should You Do?
This section might as well be titled what you can't do, as much as what should you do. First of all, we always tend to forget the hard questions. For example, should we not ask, what are the testing environments being used? Seems reasonable to me, but we often do not ask, as this is not our problem; it is the vendor's problem. They sold the configuration and they have to make it work. To me, this seems a little like a self-fulfilling prophecy. If you do not ask, they will not know and the sales team will never ask development. In my opinion, this gets you what you deserve. Asking the tough questions about the tested configuration vs. the installed configuration is your responsibility. If the vendor team brushes you off, at least they were warned.
The second thing that I think is the customer's responsibility is to understand what was tested and what is different from what was tested, in terms of operational conditions and hardware and what you plan to buy. Having a good understanding of what was tested and what was not will give you some idea of the risk of the configuration you are buying actually working.
Last but not least, sales people often oversell given the competitive environment. Some of this is authorized and some is not. If a vendor is trying to sell you a product and other products on the market with equal or better features meet your requirements, this is a situation with a high potential for overselling. Sales people have quotas, and they will often do what they have to do to meet them.
As my mom always said, the truth and the responsibility go somewhere in the middle. Customers are not 100 percent correct here, and vendors are not 100 percent wrong, but I feel that when stuff does not work, the customer likely carries 10 percent of the blame and the vendor 90 percent. Customers should look for the signs.
Henry Newman is CEO and CTO of Instrumental Inc. and has worked in HPC and large storage environments for 29 years. The outspoken Mr. Newman initially went to school to become a diplomat, but was firmly told during his first year that he might be better suited for a career that didn't require diplomatic skills. Diplomacy's loss was HPC's gain.