New Paper: Text mining Mill: Computationally detecting influence in the writings of John Stuart Mill from library records

Image of handwritten library register — London Library Issue Book No. 3 showing John Stuart Mill’s intensive borrowing record during 1845, London Library Issue Book Number 3, p. 529. The horizontal lines indicate the return of individual books. The vertical lines indicate that all the books listed on the page have been returned. Image reproduced with the kind permission of the London Library. © The London Library

How can computational methods illuminate the relationship between a leading intellectual, and their lifetime library membership? I’m pleased to say that a paper, derived primarily from the work Dr Helen O’Neill conducted for her PhD thesis in Information Studies at UCL, on The Role of Data Analytics in Assessing Historical Library Impact: The Victorian Intelligentsia and the London Library (2019), supervised by myself and Anne Welsh, has just been published. The interesting thing about this paper is that it started life as a tweet:

Does anyone know if folks have used Turnitin to detect plagiarism in historical texts? Would it work? ie stuff published 1800s?
— melissa terras (@melissaterras) March 6, 2013

Replies from both David A. Smith (at Northeastern), and Glenn Roe (now at the Sorbonne), who took the time to detail and explain their previous work in detecting textual reuse, led to a collaboration. In O’Neill’s doctoral work, we explored the interrelation between the reading record and the publications of the British philosopher and economist John Stuart Mill, focusing on his relationship with the London Library, an independent lending library of which Mill was a member for 32 years.

Building on O’Neill’s detailed archival research of the London Library’s lending and book donation records, O’Neill constructed a corpora of texts from the Internet Archive, of the (exact editions) of the books Mill borrowed and donated, and publications he produced. This enabled natural language processing approaches to detect textual reuse and similarity, establishing the relationship between Mill and the Library. With Smith and Roe’s assistance, we used two different methods, TextPAIR and Passim, to detect and aligning similar passages in the books Mill borrowed and donated from the London Library against his published outputs.

So what did we show? The collections of the London Library influenced Mill’s thought, transferred into his published oeuvre, and featured in his role as political commentator and public moralist. O’Neill’s work had already uncovered how important the London Library was to Mill, and how often he used it, but we can also see that in the texts he wrote, given the volume of references to material in the London Library, particularly around certain times, and publications.

The important thing about this really is that we have re-conceived archival library issue registers as data for triangulating against the growing body of digitized historical texts and the output of leading intellectual figures (historical turn-it-in, huh). This approach, though, is dependent on the resources and permissions to transcribe extant library registers, and on access to previously digitized sources. Because of complexities in privacy regulations, and the limitations placed on digitisation due to copyright, this is most likely to succeed for other leading eighteenth- and nineteenth-century figures. Still cool, though.

On a personal note – this is the last paper I’ll be publishing from work that I started while employed at UCL. It was important to me to see through the PhD supervisions I had committed to, and I was delighted when Helen O’Neill (who did her PhD part time, while working full time!) passed her viva with no corrections at all (yay!) in 2019. Happy end to an era, for me, and yes, it has taken a full three years to finish up the transition of research to Edinburgh! But all good.

Here’s the reference to the paper proper:

O’Neill, H., Welsh, A., Smith, D.A., Roe, G. and Terras, M., 2021. Text mining Mill: Computationally detecting influence in the writings of John Stuart Mill from library records. Digital Scholarship in the Humanities. https://doi.org/10.1093/llc/fqab010

I’ve parked an open access version for you to read without the paywall. Enjoy!

TextMiningMill_ONeill_Roe_Smith_Welsh_Terras Download