Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
When I started in this business more than 30 years ago, it took a supercomputer to do what a laptop can do today, and networks were in their infancy in places like Stanford. Storage is a lot more complicated these days, and storage architects and administrators need to be on top of a whole lot more than they used to. So with a nod to the now-retired David Letterman, here is my list of the Top 10 things storage architects and admins need to be monitoring and doing.
#10 Looking for Soft Errors
Hard and soft errors on the host, in the network and on devices are going to slow down your system, and in the long run, soft errors usually turn into hard errors. Storage architects need to ensure that they have a management framework that traps these errors and provides alerts to the administration staff, and the administration staff needs to aggressively go after these errors. A case can be made that multiple soft errors could increase the potential for silent data corruption.
#9 Performance Analysis
The days of throwing hardware at performance problems likely ended in 2008; today storage administrators and architects need to monitor performance continually given the complexity of the storage hierarchy. There are so many caches today in the datapath that it is hard to understand how all of them interact. A friend of mine once said that caches serve only two purposes: to reduce the latency for writes and reads, but only if the data fits in the cache. If the data doesn’t fit in the cache, you have a mess on your hands. Understanding performance and knowing the architecture is key to having a cost-effective system for both the short and long term.
#8 Understanding Application Workloads
Applications are what matters, because without application requirements no one would need to buy compute or storage. Therefore, it behooves the people designing and administrating the system to understand fully the applications that will run on the system and what resources they are going to need. Everyone should understand what the applications do to the system and the business objectives for each application. For example, does a specific application need to run in a specific time to meet a business objective? If so, the system must be engineered to what is called peak or peak load. Does the application need to meet its timeline objectives while the system is doing a RAID rebuild, for example, or a controller failover? Defining the expectations for applications and the systems they run on is critical for success.
#7 Object Storage
The POSIX stack really has not changed since the late 1980s (yes, that’s not a typo) except for the addition of asynchronous I/O in the early 1990s. POSIX has a number of known performance limitations for metadata scaling and scaling in general. Object storage was designed to overcome POSIX limitations and has a far simpler application interface. The problem is that we have about 30 years of software development in applications that expect a POSIX interface for I/O, and you are not going to change all of that code overnight. Object storage is in your future and understanding how and where it can fit into the current environment should be a priority.
#6 Software Defined
The use of software defined storage is growing and will continue to grow. My concern goes back to the issue of errors, as much of the software does not have enclosure management and monitoring. Software defined storage is become more widely available, but it is prudent to understand what you are getting and what you are missing from the stack compared to what you got from your previous storage vendor. How important is what you are missing to the mission of the organization and ensuring that you can meet the mission?