Resolving Finger-Pointing in Storage Page 4
I am not a database expert, nor an expert on DWDM connections, nor an expert on the RAID in question or on the remote mirroring software. I do however understand the end-to-end issues and how things should work, and I believe I know I/O better than most. I went down the wrong path on this problem a few times, but was able to back up and start over and finally arrive at the correct culprit.
The case described above was a relatively easy problem to resolve, as it basically boiled down to a configuration issue rather than a hardware problem, software bug, or programming error. And while the database program in the case mentioned remains a performance bottleneck due to the fact that it’s poorly written, the customer can now limp by until it can be rewritten.
I illustrate this case to point out that whatever the problem is, the key to resolving it is being able to look at the issue from end-to-end and doing so without getting bogged down in the politics and disparate communication coming from the various players. It also requires the skills set to at least know at a high level how things should work and what to look for when they do not work correctly.
It’s also imperative to ask many questions and be able to efficiently and effectively process the results. That means looking at the forensic evidence and drawing conclusions — even if they’re sometimes wrong — and being willing to back up and start over again if necessary. These are just a few of the more important qualities involved in being able to resolve a finger-pointing exercise.
With hardware and software not being purchased from a single vendor nearly as often, finger-pointing is likely to continue to occur more frequently in our industry. In other words, with the Chinese food menu approach of picking hardware and software from different vendors becoming more and more common, finger-pointing is increasing in parallel.
The key points to take away are that being prepared for successfully navigating these types of issues means having people with the right skill sets to deal with the problems when they arise, or being able to find the right people to solve the problems at hand.
Be prepared for the hard problems that sometimes take weeks to resolve. These issues often involve data corruption, so expect them to make for quite unpleasant and often messy situations. And while you can always hope they’re simple configuration issues that can be swiftly resolved, don’t count on that happening very often.