#CityLIS Events: Visit to the Internet Archive by Sophie Johnston

*** This post is by current MA/MSc Library Science student Sophie Johnston. Here she describes the background to the CityLIS forthcoming visit to the Internet Archive. Sophie is our student rep for Library Science for 16/17.***

I am now in the second term of my MA/MSc in Library Science and we’ve started studying on a module called Digital Libraries. In this class we’ve been discussing the role of libraries in the digital age and the ways in which online content for libraries, archives, museums, and art galleries are now arguably blurring into the single category of ‘digital libraries’.

The relevancy of digital content in libraries cannot be overstated as so much content is either born online or duplicated online, creating the need for digital preservation. Which is where organisations like the Internet Archive come in. Created in 1996 and located in San Francisco, they are a non-profit organisation and rely on data donations from others. They believe in the importance of preserving cultural artefacts and as there is a danger of webpages being deleted and their content never being recovered, their mission is to capture as much online content as possible. [1]

Despite having the word ‘archive’ in the organisation name, there is regular mention on their website of the organisation as a ‘library’; for example, ‘The Internet Archive is one of the world’s largest public digital libraries’. [2] This goes back to my earlier mention of the crossover between digital libraries and digital archives. As physical spaces these are two different places, and yet it can be difficult to define the distinction between their digital counterparts.

It therefore seemed appropriate to try and arrange a visit to the digitisation centre at the Internet Archive’s London offices. Chris Booth, Digitisation Manager, has kindly agreed to show our class around and talk to us about their current projects and processes. This is a fantastic opportunity and we’re all looking forward to it.

Access to the Internet archive is via their website. The homepage has a search engine function for the whole site as well as a banner at the top of the page to search their ‘Wayback Machine’. This function allows you to search for a url, or keywords, and look at archived webpages by date. A search for the BBC news website came back with 25,306 archived webpages saved between December 1, 1998 and February 19, 2017. Overall the Wayback Machine claims to have 279 billion webpages saved.


A nice feature about the homepage is that if you scroll down the page you are given a large selection of their top collections. These are in no specific order and it a great way to explore the collection in the same serendipitous way you might browse a bookshop or physical library. Collections include ‘Russian Audiobooks’ (28,303 items), ‘Hip Hop Mixtapes’ (12,055 items), ‘Classic PC Games’ (9,923 items), and ‘Political Ads’, (3,475 items) to name a few. There is also the option to browse via media type, with options for images, software, audio, video, texts, and web.

The Internet Archive seems like a great tool for digital preservation, as well as a publicly accessible resource. It will be interesting to see how both the Library and Information Science community and others look back, in say 30 years, at organisations like this and how helpful they have proven to be in cultural preservation.


[1] https://archive.org/about/

[2] https://archive.org/projects/


This post first appeared on Sophie’s blog LibraryGoth on February 19th 2017. Sophie is also on Twitter as @sophieanna30

About lyn

Dr Lyn Robinson is Reader in Library & Information Science, and Head of Department at City, University of London. She established and directs the Library School, and co-directs the Centre for Information Science alongside Prof David Bawden. Contact: lyn@city.ac.uk
This entry was posted in Events and tagged , , , . Bookmark the permalink.