UK Web Archive
   HOME

TheInfoList



OR:

The UK Web Archive is a
consortium A consortium (plural: consortia) is an association of two or more individuals, companies, organizations or governments (or any combination of these entities) with the objective of participating in a common activity or pooling their resources for ...
of the six UK
legal deposit Legal deposit is a legal requirement that a person or group submit copies of their publications to a repository, usually a library. The number of copies required varies from country to country. Typically, the national library is the primary reposit ...
libraries which aims to collect all UK websites at least once each year.


History

In 2005, the
British Library The British Library is the national library of the United Kingdom and is one of the largest libraries in the world. It is estimated to contain between 170 and 200 million items from many countries. As a legal deposit library, the British ...
,
The National Archives National archives are central archives maintained by countries. This article contains a list of national archives. Among its more important tasks are to ensure the accessibility and preservation of the information produced by governments, both ...
,
Wellcome Trust The Wellcome Trust is a charitable foundation focused on health research based in London, in the United Kingdom. It was established in 1936 with legacies from the pharmaceutical magnate Henry Wellcome (founder of one of the predecessors of Glaxo ...
,
National Library of Scotland The National Library of Scotland (NLS) ( gd, Leabharlann Nàiseanta na h-Alba, sco, Naitional Leebrar o Scotland) is the legal deposit library of Scotland and is one of the country's National Collections. As one of the largest libraries in the ...
,
National Library of Wales The National Library of Wales ( cy, Llyfrgell Genedlaethol Cymru), Aberystwyth, is the national legal deposit library of Wales and is one of the Welsh Government sponsored bodies. It is the biggest library in Wales, holding over 6.5 million boo ...
and
JISC Jisc is a United Kingdom not-for-profit company that provides network and IT services and digital resources in support of further and higher education institutions and research as well as not-for-profits and the public sector. History T ...
formed the UK Web Archiving Consortium, a project to archive websites. UKWAC archived selected websites by licence or permission, using PANDAS software developed by the
National Library of Australia The National Library of Australia (NLA), formerly the Commonwealth National Library and Commonwealth Parliament Library, is the largest reference library in Australia, responsible under the terms of the ''National Library Act 1960'' for "mainta ...
. During the project its members collected sites relevant to their interest; the Wellcome Library collected medical sites, the national libraries sites that reflect life in contemporary Wales or Scotland. The British Library worked with a broad policy of collecting sites of cultural, historical and political importance to the UK. The Consortium wound up in 2010. The Archiving and Preservation Working Group took over UKWAC's co-ordinating role web archiving in the UK. The
Digital Preservation Coalition The Digital Preservation Coalition (DPC) is a UK-based non-profit that works with global partners to provide the necessary resources to educate various public and private entities on the best practices for long term digital preservation. Backgr ...
hosts the working group.


Web Archiving

The archive undertakes an annual crawl of
.uk .uk is the Internet country code top-level domain (ccTLD) for the United Kingdom. It was first registered in July 1985, seven months after the original generic top-level domains such as .com and the first country code after .us. , it is the fift ...
and other UK geographic
Top Level Domains A top-level domain (TLD) is one of the domains at the highest level in the hierarchical Domain Name System of the Internet after the root domain. The top-level domain names are installed in the root zone of the name space. For all domains in ...
such as
.scot .scot is a GeoTLD for Scotland and Scottish culture, including the Gaelic and Scots languages. Later it was decided to allow almost any top-level domain for introduction some time in 2013, and a list of applications for these was published i ...
,
.cymru .cymru is one of two geographic top level domains (GeoTLD) for Wales (the other being .wales). The word ''Cymru'' means Wales in Welsh. Proposal and use The TLD was proposed by the British internet registry company Nominet, which has run t ...
or
.london .london is a top-level domain (TLD) for London London is the capital and largest city of England and the United Kingdom, with a population of just under 9 million. It stands on the River Thames in south-east England at the head of a e ...
. The crawl is archived in a shared infrastructure called the Digital Library System. Members of the public can nominate sites for preservation there through th
UKWA website
The whole web archive is available to registered readers on library premises; and where permission has been given, or license conditions can be met, copies are also accessible through the website. The archive gathers sites in response to events, building collections - these have preserved writing and imagery recording natura
disasterselection campaigns
since
2005 File:2005 Events Collage V2.png, From top left, clockwise: Hurricane Katrina in the Gulf of Mexico; the Funeral of Pope John Paul II is held in Vatican City; "Me at the zoo", the first video ever to be uploaded to YouTube; Eris was discovered in ...
and the UK'
blogosphere
for research, among more than a hundred more.


SHINE

The UK Web Archive holds a collection of all the
.uk .uk is the Internet country code top-level domain (ccTLD) for the United Kingdom. It was first registered in July 1985, seven months after the original generic top-level domains such as .com and the first country code after .us. , it is the fift ...
websites that were archived by the
Internet Archive The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge". It provides free public access to collections of digitized materials, including websites, software applications/games, music, ...
until the end of March in 2013. SHINE is a
web interface In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine fr ...
which can be used to create repeatable lists of results of historical .uk pages. Trends, or occurrences of keywords in the data set on .uk pages over that time, use
concordance Concordance may refer to: * Agreement (linguistics), a form of cross-reference between different parts of a sentence or phrase * Bible concordance, an alphabetical listing of terms in the Bible * Concordant coastline, in geology, where beds, or la ...
to show keywords in context.


Mementos

Memento is a name for prior versions of web pages coined by the
Memento Project Memento is a United States ''National Digital Information Infrastructure and Preservation Program (NDIIPP)''–funded project aimed at making Web-archived content more readily discoverable and accessible to the public. Technical description ...
. The UK Web Archive Memento interface allows Mementos to be found across
web archives Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated captur ...
. The interface can be used to find a Memento by its
date Date or dates may refer to: *Date (fruit), the fruit of the date palm (''Phoenix dactylifera'') Social activity *Dating, a form of courtship involving social activity, with the aim of assessing a potential partner ** Group dating *Play date, a ...
in a
snapshot Snapshot, snapshots or snap shot may refer to: * Snapshot (photography), a photograph taken without preparation Computing * Snapshot (computer storage), the state of a system at a particular point in time * Snapshot (file format) or SNP, a file ...
table, or see how often a site appears across public web archives.


Researching the archive

Research into the web as a reflection of
society A society is a group of individuals involved in persistent social interaction, or a large social group sharing the same spatial or social territory, typically subject to the same political authority and dominant cultural expectations. Socie ...
has helped develop access to the archive. Libraries have developed guides to research skills needed to use web archives. These include using big data to see patterns or trends, or writing
citation A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose of ...
s for archived copies of websites.


GLAM Workbench

GLAM Workbench is a project which looks at how researchers can use data preserved by galleries, libraries, archives and museums. It includes a collection of
Jupyter notebook Project Jupyter () is a project with goals to develop open-source software, open standards, and services for interactive computing across multiple programming languages. It was spun off from IPython in 2014 by Fernando Pérez and Brian Granger ...
s which draw on Mementos and index data. The notebooks mix description and editable code to help researchers find evidence in web archives.


See also

* National Records of Scotland Web Continuity Service * Public Record Office of Northern Ireland Web Archive *
UK Government Web Archive The UK Government Web Archive (UKGWA) is part of The National Archives of the United Kingdom. The National Archives collects records from all UK government departments and bodies creating records defined as Public Records under the British Publi ...
* UK Parliament Web Archive * Web Archiving Initiatives


References


External links


UK Web Archive home page

The UKWA blog


Archived UK government websites, run by UK National Archives
Digital Preservation Coalition - Web Archiving and Preservation Task Force
{{authority control Archives in the United Kingdom College and university associations and consortia in the United Kingdom Information technology organisations based in the United Kingdom Internet in the United Kingdom British digital libraries Organizations established in 2005 Web archiving Web archiving initiatives 2005 establishments in the United Kingdom