Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
More than seven years ago, I wrote an article titled Rocks Don’t Need to Be Backed Up. It was something of an internet sensation as storage articles go, generating more than 100,000 views in a matter of days.
I based the article on an Egyptian obelisk my wife and I saw in New York City’s Central Park. Here is the picture used in the 2009 article, in which you can clearly see the 4,000-year-old object:
Note that you can clearly see the hieroglyphics, and thanks to the Rosetta Stone, you can actually translate what it says into a modern language. But until finding the translation on the Rosetta Stone between ancient Egyptian hieroglyphs, Demotic script and Ancient Greek, no one knew what these hieroglyphs meant for roughly 1,800 years. Non-optimal, as I like to say.
So while in France this year with my wife, I saw another Egyptian obelisk and thought it was time for an update:
Clearly, there is some significant data loss on the above rock, so I guess some rocks do need to be backed up, as anyone who’s ever visited an old graveyard can attest.
What does this mean for data protection?
Hieroglyphs are pretty easy compared to what we have today. Something as simple as file formats, many of which are used on Microsoft OS and others, are a nightmare. Here are just a few of the ones starting with the letter A:
AlZip Compressed Files' pieces
Graphics AIIM image file
World file for Alice programming language, version 3.1
Assembler language source for 8080
AA, AB, ...
Split parts of a single whole file
Animation Play Script
Android adb Backup
Alembic - 3D geometry / models
Music in ABC format
AVS Barcode Profile
Adobe Binary Screen Font
Automatic backup file
Image PALS album file
Album file (various programs)
Abstracts (info file)
There are 275 beginning with the letter A alone, 180 for B, and so on. You get the point.
In 20 years, much less thousands of years, how is anyone going to figure out what data is stored in each of these file formats? Of course, some of them are open source, but many are not. And even for open source, who is going to save the formats and information for a decade or more? I cannot even open some MS Office documents from the early 2000s, and that is less than two decades. The same can be said for many other data formats. There are self-describing data formats such as HDF (Hierarchical Data Format), which is about 30 years old, but outside of the HPC community, it is not widely used. There are other self-describing technologies in other communities, and maybe like HDF they could be used for virtually any data type. However, everyone wants what they have, not something new or different, and NIH is what usually happens in our industry.