The Irony of Writing Online About Digital Preservation – The Atlantic
When I started my research into news preservation, I thought there would be an easy technological solution. There isn’t. Every media company in the world grapples with the issue of digital archiving. Large legacy organizations, like The Atlantic or The New York Times or the BBC, do a better job than smaller companies, but nobody has a solution. From a software perspective, it is a legitimately difficult problem: unsolved, but probably not unsolvable. “The challenges of maintaining digital archives over long periods of time are as much social and institutional as technological,” reads a 2003 NSF and Library of Congress report. “Even the most ideal technological solutions will require management and support from institutions that in time go through changes in direction, purpose, management, and funding.”
Newsrooms need to manage workflow and content for print, audio, visuals, video, and code. Most software is built for companies that do only one of those things at a time; newsrooms do them all simultaneously. Every time a new technology is introduced, a newsroom needs a new content-management or workflow system to handle it. Ensuring interoperability between these systems and archival systems requires engineering, ingenuity, and regular attention.
The scale is different for newsrooms, too. Facebook only has to manage 11 years’ worth of data, all of which is digital and all of which is structured exactly the way it needs to be structured. A legacy media company might have to deal with more than a hundred years’ worth of data, only some of which is digital, all of which is potentially important to scholars, all of which has different licensing restrictions and preservation needs and is ambiguously structured. Remember when Macromedia Flash was the new hot thing in journalism? Most of those elaborate Flash projects have disappeared now. They’re probably archived on Jaz drives in a storage room somewhere, next to boxes of color slides and piles of floppy disks and other outdated media. Future historians will likely lament this loss.
The web only shows recent history. “Not one publication has a complete archive of its website,” my colleagues Kathleen Hansen and Nora Paul write in their NRJ article, “Newspaper Archives Reveal Major Gaps in Digital Age.” “Most can go back no earlier than 2008 … In every case, informants talked about the chaos of switching CMSes or servers, of shifting organizational homes for the website, of staffing changes and many other elements that have had an impact on the integrity of the website over time.”