GitHub expands open source archive program into three key libraries


Historians and future generations of developers will be able to unearth early lines of open source Linux, Ruby, or Python code buried 250 feet under the earth’s permafrost layer and, now, in three historic libraries in Oxford, Egypt, and California, thanks to GitHub’s expanding Archive Program.

Announced last year at the code management company’s Universe event, the aims to preserve in much the same way we do works of art, design, or literature. By printing historically relevant open source repositories onto reels of piqlFilm (digital photosensitive archival film), GitHub—which was acquired by Microsoft in 2018—hopes to preserve the open source software movement for future generations.

This program includes the storage of a code archive in the Arctic World Archive in Svalbard, Norway—just one mile away from the famous Global Seed Vault—by storing 186 reels of piqlFilm and 21TB of repository data in a decommissioned coal mine 250 meters deep in the permafrost this summer.

Run in partnership with the Long Now Foundation, the Internet Archive, the Software Heritage Foundation, Arctic World Archive, and Microsoft Research, the program looks to preserve both “warm” and “cold” versions of the code to ensure multiple copies and formats of the software are preserved, also known as the “LOCKSS” approach by archivists, or Lots Of Copies Keeps Stuff Safe.

GitHub. Photo: Glenn Wester

GitHub. Photo: Glenn Wester

Now, the project is expanding by donating reels of hardened microfilm to the 400-year-old Bodleian Library at Oxford University in England; the Bibliotheca Alexandrina in Egypt, and the Stanford Libraries in California; as well as storing a copy in the library at GitHub’s headquarters in San Francisco.

Preserving the GitHub stars

GitHub is preserving its most popular repositories by the number of “stars” given by the community, including projects like Linux and Android and programming languages like Ruby and Go. The company is also preserving 5,000 repositories picked at random.


All archived code will also include technical guides to QR decoding, file formats, character encodings, and other critical metadata so that future developers can decode it. “Storage is not the same thing as preservation, you have to do other things,” Ovenden said.

Copyright © 2020 IDG Communications, Inc.