E-Collections: a talk organised by CPD25 – 14/11/2015

The talk covered the following areas:
• Key components in creating a Digital Library
• Insight into the building of eCollections:
• Web Archives, eBooks, eJournals
• Activities to support their development
• collating stats
• Good practice and available frameworks

Building a Digital Library:
Understanding the key components
The workshop commenced with an introduction from Adjoa K. Boateng of the British Library, where the physical environment was used as a motif for data type and allocation to highlight the scale involved (perhaps inspired by the cover of the following title, Law and the Internet by Edwards and Wealde) e.g.

Earth – Print/digital applications
Clouds – Analogue data
Land – IT and infrastructure
Sea – Hard data
Attention was given to: The Resource Discovery Network (RDS), user interfaces (OPAC), the relevance of link resolvers and the Integrated Library System (ILS, LMS) and how the latter acts as a cloud system, before handing over to one of the BL UK Web Archive librarians.
The UK Web Archive – Building the resource: http://britishlibrary.typepad.co.uk/webarchive/

The British Library web archive is an Open UK web archive, that’s been running since 2004 and accommodates more than 15,000 sites. It consists of selected content of cultural significance, covering subjects such as: UK Politics – the 4th Plinth project – and the Olympic Games logo etc. for example. Its Contributors include:

British Library
Joint Information Systems Committee
National Archives (UK)
National Library of Wales
National Library of Scotland
Wellcome Library.

It also contains a Legal deposit web archive, and contributes to The Internet Archive, i.e. The Way Back machine, which was founded in 1996 in San Francisco by Brewster Kahle, a digital librarian from MIT. https://archive.org/

Affiliated networks and platforms include:
The International Internet Preservation Consortium “a global network of experts archiving the Web for future generations” to which the BL contributes. http://www.netpreserve.org/

And the …
Digital Preservation Coalition (DPC)
The DPC is a non-profit that acts as a consortium currently supporting the preservation of UK and International digital knowledge and memory, i.e.
-Digital Preservation
-Object Identifiers
-Digital preservation Europe http://www.dpconline.org/
Essentially a national coalition, the Digital Preservation Coalition has support from numerous UK Institutions including Oxford and Cambridge Universities, the Bank of England, a number of national libraries and the United Nations secretariat; including allied member organisations in Australia, Holland and Germany for example. It covers:
• Digital Preservation
• Digital curation
• Digital object identifying
• Conservation and restoration
• Digital preservation Europe
A digital object identifier (DOI) is a unique alphanumeric string assigned by a registration agency (the International DOI Foundation for example) to identify content and provide a persistent link to its location on the Internet. The publisher assigns a DOI when an article is published and made available electronically.

See also JIBS https://www.jibs.ac.uk/role.html
The talk then looked at Web Archives as a library collection, collating and using statistics, arguing for more, getting a return on investment, and the successful acquisition of eBooks and Journals. It was acknowledged that search engines and rights management etc., take up the majority of the time, with an emphasis placed on Inter-operability, authentication and compatibility.
The building of E-Collections
Web Archives – E-Books – E-Journals
Terms of reference:
DOI Digital object identifier
EDL Event Driven Language
EBRA Evidence Based Resource Acquisitions
MHTML Multipurpose Internet Mail Extensions (MIME) a programme used for incorporating other formats e.g. Java, opera, Internet Explorer, email etc. into basic html documents for digital-archives

MHTML Rendering Is a webpage archive format which “Renders a report in MHTML; requires a work flow management system” e.g. www.telerik.com

MM data info – repository – could relate to:
MediaMonkey a digital media player and media library application
Memory Manager Software for memory management in the kernel of the operating system

The main considerations for search engines are:
-Interoperability – authentication, provenance
-Subscriptions and rights to access
-Sustainability – compatibility, preservation
-User base – and how it might change
-Strategic provision

Benefits include:
-Making access easier and more economical
-Aiding service provision – short and long term

The British Library has two main digital archives:

The Legal deposit archive

And …

The Open UK web archive

There are three main sections …

1) Legal Deposit:
The introduction of the non-print legal deposit regulations in 2013 allows for the archiving of all UK domain named sites. The legal deposit contains millions of webpages obtained through annual archiving of all UK domain names. It tends not to archive film and A/V, and excludes sites considered private including intranets and email. It has seen a phenomenal growth in content over a short period of time, e.g. the 2013 domain crawl produced 3.8 million seeds, compared with the 20 million seeds produced in 2014. Focused crawls centre on topics such as NHS reforms, WW1, the General Election and the Magna Carta for example. It can only be accessed on reading room computers or on premises controlled by the BL. It also represents the National Libraries of Scotland and Wales, and the libraries of the Universities of Oxford and Cambridge, as well as the BL.

2) The Open UK web Archive:
This is smaller in size than the legal-deposit archive and was set up in 2003. It regularly saves UK web pages considered to be of historical or cultural significance and therefore is more selective. It incorporates JISC UK Web Domain Dataset (1996-2013) to do this and can be accessed from anywhere in the world at any time.
Either collection may contain copies of the same website.

This is an application that focuses on trends analysis, it also accommodates judicial searches within the region of 3.5 billion items. It doesn’t provide for relevance as there are no metrics, instead it uses content type, year, post-code etc., employs Boolean, and uses CSV files for spreadsheets. Its graph layout is a useful research tool for trend analysis.

SHINE was developed as part of the Big UK data Arts and the Humanities project, and is funded by the AHRC http://www.ahrc.ac.uk/
UK web domain dataset 17 billion – used for research online
(It has a NPLD tab, and goes through a licensing guest for permissions for open access).


The second part of the talk looked at:
E-Books and E-Journals:
Being that some of the delegates were talking about statistics during the break, this part of the talk began by acknowledging the use of statistics gathering platforms …

JUSP: JUSP is a Journals usage statistics portal; free to JISC funded universities. It assists the management of e-journal collections, has a single point of access, and provides statistics on best deals for the academic community. Data is gleaned from 70 sites. http://jusp.mimas.ac.uk/
SUSHI: Machine to machine standardised usage statistics harvesting initiative incorporating …

Compliant Online Usage of Networked Electronic Resources
Compliant usage data – goes back to 2009, and provides the following with report graphs:
• Journal level reports
• Summery reports
• Titles and deals reports
• Usage profiling reports (Benchmarking)
JR1 reports, Trends over time reports; can see % usage over the year thereby providing ‘usage of titles’ deals based on core titles. The usage profiling comparator provides ‘usage of titles’ deals based on core titles, the Russell Group, Your usage and Academic year ratio, and is used as part of a business case for funding. Supporting accurate reporting, there’s a ‘Publishers Issues’ tab that evaluates over collecting data shift. Another part of its remit is to assist others to understand resource usage and its implications, and their uses, i.e. academics, Exco etc. as it informs collection allocation in line with user experience.
Areas discussed: …
-The use of E-Book aggregators
-Dawson’s aren’t publishers – publishers can, and do, withdraw content.
-Textbooks can be a problem with regards to availability, mainly due to affordability
-Online access is not necessarily a preferred user choice, but a practical solution to this.

While some texts are increasingly being uploaded to Moodle, they are not technically considered E-books as they are embedded in the platform, therefore purchase price is still a consideration when ordering an actual digital edition.

-Open access over rides copyright
-Linking to other content
-Digitised books from specialist publishers are accessed through Institutional repository
-Provision – Sourcing

-Individual/bulk: preferences, pros and cons
-Subscriptions: bundles
-Mix of journals, books and other content, i.e. law collections statutes/news/databases etc.
-Big deals, e.g. Springer 2000 + subscriptions delivered offline as pdf’s
-e-journal archives, e.g. JSTOR open access journals – liaise with library director
-Changing nature of open access for budget holder

-Management costs
-Accounts payable
-Disability support team re appropriate formats – dedicated contact and training
-Distance learner implications
-Purchasing Consortia – Suppliers and other intermediaries
-Simplified administration
-Good negotiated price
-Link up to book suppliers and subscriptions agents -Value lies in consolidating packages
Always ask WHY are you acquiring this content? Is it a patron driven acquisition? Is it a rental, or … is it a ‘are you building a collection’ purchase?

Consider where you are keeping your records. LMS (Sierra) shared file folders, and email accounts? for example. Try to avoid multiple spreadsheets

The use of ALMA – in Toto http://www.exlibrisgroup.com/category/AlmaOverview
-Helps with consolidating disparate library systems
-Cloud based infrastructure
-“Re-direct resources to focus on extending library services within and outside their institutions in direct support of teaching and research goals” – See more at:

-Consulting with departments at year end t
-Updating purchasing guidelines
-Setting up IP addresses with publishers
-Checking E-Book listings
It’s more about renewing subscriptions and licenses than ordering new purchases
Think about:
-Financial management
-Licensing and copyright
-And MARC edit for e-books
Summing up:
This was a worthwhile talk whether working directly with E-Collections or not.
If you are, it provided plenty of information and material with opportunities for interactive discussion around issues and solutions.

If you aren’t directly involved with E-Collections, it still offers an interesting talk and insight into related areas likely to be encountered in collection development and management, e.g. Licensing and access issues, the nature of ‘deals’ and useful research tools, including the avoidance of duplication; both with resources, and with management systems and the people controlling these.

Overall the talk gave a broad introduction to the subject to be able to contextualise one’s own collection, from which to build further information as level of involvement may or may not require.

Held at the Mathematical Society London WC1 – 14th November 2015

