Student Perspectives: Mining For Data

The Student Perspectives category collects posts written by current CityLIS students.

Nina Byrom looks at the benefits and potential drawbacks of integrating data mining into library and archive catalogues from the perspective of the digital humanities. You can follow Nina on Twitter @ByromNina.


The discussion about the rise of text mining software in relation to the Digital Humanities caused me to reconsider what the future of library collections and archives could look like. There is a view that at present, integrating text and data mining software into collections and archive research could provide researchers and collection managers with the ability to work on large-scale projects and documentation efforts that would have previously been too labour intensive to undertake. (Berry, 2019) However, the ability to extrapolate thematic data across entire collections, to distance read on a wide scale, has the potential to impact how digitised library collections are presented in the future. If, for example, a collection was found to have multiple common elements would there be a danger of it being marketed to the users based on those elements alone, potentially excluding less frequent, but still important elements?

There is arguably a precedent for this in the library sector already due to specialised collections, but even in these there is a level of variance. The risk of text mining being used to summarise collections in the broadest terms possible could be balanced out by the level of detail that it can provide, allowing curators to have a deeper understanding of a larger percentage of a collection. This could create a more effective curation cycle, as changes to the contents of the collection would be more easily comparable.

Changes in digital documentation due to the Digital Humanities could also change the way that library users access and use information. This is supported by the integration of digital media into everyday life, as Sabharwal (2015) argues that the GLAM sector has adapted to digital presentation and curation approaches, in order for a remote audience to be able to access previously inaccessible information. (p.125) However, Sabharwal (2015) also argues that in the digital sphere public curation of collections can also occur, driven by the rise of social media. (p.125) With many institutions already maintaining multimedia social media presences that allow for public critique, (Sabharwal, 2015, p.127) there is a potential risk that text mining could be used by institutions to collect and study external public opinion en masse. However, due to the availability of free-to-use text mining software, users could do the same in order to extrapolate meaning from an institution’s digital presence, creating a level of scrutiny that the sector may not be accustomed to receiving.

The availability of text mining also raises another interesting possibility for the library sector; that it could be built in to digital catalogues, in order to allow researchers and users to feed their findings based around catalogue contents back into the catalogue itself, using data and text mining in tandem to promote further development. Something similar has already occurred in research projects such as Curating Menus, which cleans and uses publicly sourced datasets about the New York Public Library’s menu collection. (Rawson, 2016, p.59) It’s important to note that this relies upon users accessing and inputting information through a single data collection system, however as Rawson (2016) points out, the digital humanities research and curation of documented data inform each other (p.60) which allows for future users to access an increasing level of information from the collection, as ‘anyone can easily download the data set from NYPL’s website’. (Rawson, 2016, p.59) The model of publicly-created datasets working in tandem with the developing Digital Humanities could potentially lead to majority publicly-curated catalogues in the library sector, where user generated datasets are monitored by data and text mining software in order to provide a better, more accessible service.


Berry, D. (2019), What are the Digital Humanities? The British Academy Blog [blog], 13 February. Available at: [Accessed 28 November 2020]

Rawson, K. (2016), Curating Menus: Digesting Data for Critical Humanistic Inquiry, In: White J. and Gilbert, H. eds. Laying the Foundation: Digital Humanities in Academic Libraries, Indiana: Purdue University Press, pp. 59-72

Sabharwal, A. (2015) Digital Curation in the Digital Humanities: Preserving and Promoting Archival and Special Collections, Kidlington: Chandos Publishing

About Joseph Dunne-Howrie

I am artist in residence in the MA/MSc Library and Information Science department at City, University of London and module year coordinator for MA/MFA Performative Writing/Vade Mecum at Rose Bruford College of Theatre and Performance.My research interests include intermediality, live performance in digital culture, participatory and immersive theatre, performance documentation, archives, and performative writing.
This entry was posted in Student Perspectives and tagged , , , . Bookmark the permalink.