HOME

TheInfoList



OR:

Software Heritage provides a service for archiving and referencing historical and contemporary
software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists ...
with a focus on human readable
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
. The site was unveiled in 2016 by
Inria The National Institute for Research in Digital Science and Technology (Inria) () is a French national research institution focusing on computer science and applied mathematics. It was created under the name ''Institut de recherche en informatiq ...
and is supported by
UNESCO The United Nations Educational, Scientific and Cultural Organization is a specialized agency of the United Nations (UN) aimed at promoting world peace and security through international cooperation in education, arts, sciences and culture. It ...
. The project itself is structured as a nonprofit multistakeholder initiative.


Overview

The stated mission of Software Heritage is to collect, preserve and share all software that is publicly available in source code form, with the goal of building a common, shared infrastructure at the service of industry, research, culture and society as a whole. Software source code is collected by crawling code hosting platforms, like
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
, GitLab.com or
Bitbucket Bitbucket is a Git (software), Git-based source code repository (version control), repository shared web hosting service, hosting service owned by Atlassian. Bitbucket offers both commercial plans and free accounts with an unlimited number of p ...
, and package archives, like npm or
PyPI The Python Package Index, abbreviated as PyPI () and also known as the Cheese Shop (a reference to the ''Monty Python's Flying Circus'' sketch " Cheese Shop"), is the official third-party software repository for Python. It is analogous to the C ...
, and ingested into a special data structure, a Merkle DAG, that is the core of the archive. Each artifact in the archive is associated with an identifier called a SWHID. In order to increase the chances of preserving the Software Heritage archive over the long term, a
mirror A mirror or looking glass is an object that Reflection (physics), reflects an image. Light that bounces off a mirror will show an image of whatever is in front of it, when focused through the lens of the eye or a camera. Mirrors reverse the ...
program was established in 2018, joined by ENEA and FossID as of October 2020.


History

Development of Software Heritage began at Inria under the direction of computer scientists
Roberto Di Cosmo Roberto Di Cosmo is an italian computer scientist and director of IRILL, the Innovation and research initiative for free software (). He graduated from the Scuola Normale Superiore di Pisa and obtained a PhD from the University of Pisa, before b ...
and
Stefano Zacchiroli Stefano Zacchiroli is an Italian and French academic and computer scientist who lives and works in Paris, and a former Debian Project Leader. Debian involvement Zacchiroli became a Debian Developer in 2001. After attending LinuxTag in 2004, he ...
in early 2015, and the project was officially announced to the public on June 30, 2016. In 2017 Inria signed an agreement with UNESCO for the long-term preservation of software source code and for making it widely available, in particular through the Software Heritage initiative. Bokova, IG, Director-General, 20092017. In June 2018, the Software Heritage Archive was opened at UNESCO headquarters. On July 4, 2018, Software Heritage was included in the French National Plan for Open Science. In October 2018 the strategy and vision underlying the mission of Software Heritage were published in ''Communications of the ACM''. In November 2018, a group of forty international experts met at the invitation of Inria and UNESCO, which led to the publication in February 2019 of ''Paris Call: Software Source Code as Heritage for Sustainable Development''. In November 2019, Inria signed an agreement with GitHub to improve the archival process for GitHub-hosted projects in the Software Heritage archive. As of October 2020, Software Heritage’s repository held over 143 million software projects in an archive of over 9.1 billion unique source files.


Funding

Software Heritage is a non-profit organization, funded largely from donations from supporting sponsors, that include private companies, public bodies and academic institutions. Software Heritage also seeks support for funding third parties interested in contributing to its mission. A grant from NLNet funded the work of Octobus and Tweag that led to rescuing 250.000 Mercurial repositories phased out from Bitbucket. A grant from the Alfred P. Sloan Foundation funds experts to develop new connectors for expanding coverage of the Software Heritage Archive


Development and community

The Software Heritage infrastructure is built transparently and collaboratively. All the software developed in the process is released as
free and open-source software Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
. An ambassador program has been announced in December 2020 with the stated goal to grow the community of users and contributors.


Awards

In 2016 Software Heritage received the best community project award at Paris Open Source Summit 2016. In 2019 Software Heritage received the award of Academic Initiative from the Pôle Systematic.


References


External links

* {{DigitalPreservation History of the Internet Web archiving Web archiving initiatives Internet properties established in 2016 Digital preservation