In my role as a storage and high performance computing consultant, I was recently working for a customer who was installing a very large archive system. We spent a number of years specifying the requirements, developing and writing the request for proposals (RFP), reviewing proposals, choosing a winner, getting the system installed and finally accepted.
After the system was accepted, the customer asked me to help review the design for the applications that were to be used on the system. The most important application was to move the data from other systems into the archive in an extremely reliable way, using checksums for every packet of data moved and checksuming the entire file, since the data in the archive had to be guaranteed to be the same as the data that was created on each of the systems.
What happened next is the reason for this article. The site, names and organizations shall remain anonymous, and the truth is that no one is to blame, not the architects of the software or the architects of the system. This is a very good example of why architecting for storage requires knowledge of the application.
I have divided the story between the system architects and the application developers, and you'll soon see why.https://o1.qnsr.com/log/p.gif?;n=203;c=204660765;s=10655;x=7936;f=201812281308090;u=j;z=TIMESTAMP;a=20400368;e=i
When we were architecting the system, those of us involved knew it was going to require up to 1 GB/sec of throughput to allow ingestion of data from the creation points and archiving the data to tape. With the amount of data we were talking about on a yearly basis (many petabytes), disk-based archives were an impossibility because of the power, heat and space requirements.
When we were developing the requirements for the system, we did not have a program that could reliably move data into the archive. The system requirements people assumed that the program that was going to move data into the archive would use 85% of theoretical bandwidth and efficient CPU usage. We believed that a parallel archive system was not required, since we could move the data efficiently using direct I/O (O_Direct on the open, see A Trip Down the Data Path: I/O and Performance).
We were only looking at our system, and figured that the writing of the application to move the data to and from the archive efficiently was a simple matter of coding. Boy were we wrong.
Our application developers had a different set of requirements. Some of the systems they were moving data from had limited operating systems such as Windows 95/98, and therefore they decided to implement the data movement and the data receiving code in Java, given the portability of Java compared to anything else available. Java allowed the applications developers to have highly portable code that would run on any platform, and given the ease and flexibility of Java, allow rapid development.
The problem with that is that Java is an interpretive language that even today does not have rich I/O constructs to allow simple system calls to read and write large blocks of data. The system architects assumed that either the requests were going to be in the multi 100 MB range or that asynchronous double buffered I/O was going to be used, since we architected the system for highly efficient data movement. Java is a great language for portability, but it is not a good language for moving large amounts of data. Even the latest version of Java with NIO Java is just not designed to move data efficiently. This is even true with Java version 1.4 and NIO.
Data Movement Requirements
If you want to read and write at near channel speed with 4 Gbit Fibre Channel or even faster per stream with InfiniBand, there are a number of commonly known ways of doing this type of I/O. These are some of them:
- Large I/O requests of at least 8 MB: Large I/O requests (see Storage I/O and the Laws of Physics) are required for efficient data movement given the performance of storage technology. We had planned that the writing application would either use 8 MB-32 MB I/O requests double buffered with a circular buffer, or simple write I/O request over a few hundred MB. The SCSI driver has a limit of around 32 MB, but that would allow multiple 32 MB I/O requests to be issued.
- Direct I/O (see this Red Hat guide): Direct I/O allows I/O to be written from the user space to the device, bypassing user buffering and the system buffer. With Java, you use the C library I/O, which on most operating systems has a buffer size of 8 KB, or you can memory map the file, or worse yet, do a character at a time I/O. By memory mapping the file, you are using the page cache, which is only efficient if you are either going to reuse the data (which on this application we were not), or if your I/O requests are small (say 8 KB from the C library) so the system can combine I/O requests. Direct I/O significantly reduces the amount of CPU time required to read and write data, often an 80% reduction in CPU time.
- Locked Pages: Each time you read or write, the kernel must lock the page(s) for each I/O so they do not get moved or changed. If you are reading and writing large amounts of data, this can be significant overhead. By using locked page with the mlock() system call (which requires root or SUID access), or using shared memory that is locked and can have the advantage of multiple megabyte pages to reduce overhead even more, you can reduce CPU overhead. On average, from what I have seen, this is another 5%-10% reduction.
The problem is that none of these can be done directly from Java or other scripting languages such as Perl.
Implementing your program with these three instead of using the C library I/O or even MMapping (memory mapping), the file will improve I/O performance by up to a factor of 10 while reducing the system CPU time by at least 90%. This is not to say that Java does not have advantages over other languages, but no language can solve all problems.
All languages have advantages and disadvantages. The C language was designed to be able to write the operating system, so it should be no surprise that the C language allows the developer the ability to make system calls to the operating system to move data efficiently. Languages such as FORTRAN (yes, it is still used) are a bit more removed from the operating system and you need to do more work to efficiently do I/O. As you get further away with languages like Java and scripting languages like Perl, doing I/O efficiently is not possible because the language constructs do not allow low-level communication to the operating system. This is done for a few reasons:
- Simplicity: By having few simple methods of data movement such as character I/O and C library I/O, the language will not have support for many constructs.
- Security: Java and Perl are portable and you do not want these languages to be able to directly address and access system specific features such as shared memory.
- Consistency: Some operating systems do not support all systems calls. How can portable languages be portable if they are making direct system calls that are not supported?
You need to consider the application before choosing the language. If you are going to move large amounts of data and need to do it efficiently and it is not throw-away code, then the best language to write the application in is C. An application written in C will not be as portable as Java, but you cannot expect it to be given the language constructs, and the same is likely true for scripting languages such as Perl.
I am not an advocate for changing everything written in Java to C, but I am advocating using the right tool for the job. You do not use a finishing hammer in place of a sledge hammer, and vice versa. Both hammers have their place in the tool box. Languages are tools and you need to clearly understand the job and use the right tool for the job.
This article was started because the system architects did not clearly communicate to the applications developers. It is important that systems are architected for the applications that will run on them. Clearly, developing an architecture that supports Java reading and writing 100 MB/sec of data is different than doing the same problem in C. This is just another real world example where architecture and applications go hand in hand.
Henry Newman, a regular Enterprise Storage Forum contributor, is an industry consultant with 26 years experience in high-performance computing and storage.
See more articles by Henry Newman.