The House of Lords – Creative Future Inquiry

On 13th October 2022 I gave evidence in person at the House of Lords, to the Communication and Digital Select Committee’s Creative Future Inquiry. It was nerve wracking, but I was prepared, thanks to excellent briefing materials, and our previous written submission, from team Creative Informatics. The transcript is available, and you can even watch it again: here’s a picture of me on that there Parliament TV.

The full report from the committee, At risk: Our Creative Future, dropped in January 2023. It stressed that, although

the UK has long been seen as a global leader in the creative sector, rapid technological advances are changing the nature of the creative industries, and international competition is rising… Unless the Government starts taking the sector more seriously, the fundamentals that underpin our success will deteriorate and our competitiveness will decline.

I’m really pleased that our evidence shines through the resulting report and recommendations. Something I said was even quoted in the body of the report, page 11:

Now let’s see if the government and funding agencies listen…

Millicent Garrett Fawcett: Selected Writings

I’m very pleased to have published, with the leading Suffrage historian Elizabeth Crawford, Millicent Garrett Fawcett: Selected Writings – available open access from UCL Press a new collection of writings by this leading UK suffragist and campaigner. 

How does this relate to my own interests and trajectory? A blog post in the LSE Review of Books covers this:

Fawcett was one of the most famous women campaigners in the UK, but by the early 21st century, her achievements were, if not forgotten, then not entirely understood. The last scholarly book to be written about Fawcett was published in 1991, and no collection of her germinal speeches, pamphlets and newspaper columns has been attempted – until now.

There are various reasons for this. The UK copyright for published works authored by Fawcett expired in 1999, 70 years after her death, and so it would have only been legally possible to create a compilation of her writings – scattered across various international publications – relatively recently. However, there were also the issues of complexity and range, and of accessing often ephemeral sources… Only an ambitious (yet unfunded) plan to hunt within mass-digitised content and to undertake digitisation-on-demand for sources which were not yet digitally available allowed us to locate and synthesise all of Fawcett’s writings that we could, including discovering a good few that had never been noted before in her bibliography…

The work that went into this book – including its scale and close attention to detail – would not have been possible without the digitised information environment as well as access to both commercial and openly licensed digital cultural heritage. The timing of the compilation of this collection, too, was coincidental in only having access to digital resources: although we had luckily collected Fawcett’s writings before the COVID-19 pandemic, the write up and analysis happened at a time of social distancing, when physical access to most libraries and archives was impossible.

Millicent Garrett Fawcett: Selected Writings is therefore an example of how digital cultural heritage can be used to enhance the understanding of individuals who have not previously been centred in modern academic inquiry. Feminist digitisation practices in libraries and archives – those which centre women’s lives and their histories, taking ownership of information that has not been covered by institutionally supported digitisation or funded initiatives – can be a way to resurface feminist voices, biographies and contributions to society. The combination of using digitisation to find, source and gather material, and using mass-digitised content to understand and interrogate it, allows new, broader histories to be compiled. The digital information environment also allows these histories to be shared: we are pleased that Millicent Garrett Fawcett is available for free download via open access from UCL Press. Millicent Garrett Fawcett’s words changed a society: digital means were able to gather, understand and share them, so that her words can be read again, and understood, freely, by all.

New Chapter – Recorded performance as digital content: Perspectives from Fringe 2020

One of our major partners in Creative Informatics are the various Edinburgh Festivals, including the Fringe. We’ve worked closely with them over the pandemic, charting the sudden switch to digital, and what happened (for example see our report Learning from the 2020 Edinburgh Festival Fringe: Recommendations for Festivals and Performing Arts in Navigating Covid-19 and New Digital Contexts). I’m pleased to say that we have a new book chapter out – Recorded performance as digital content: Perspectives from Fringe 2020, which is in the new Routledge book Performance in a Pandemic, Edited by Laura Bissell and Lucy Weir. Here’s the abstract:

“Within days of performance venues being forced to close their doors in 2020, the National Theatre began broadcasting high-quality recordings of the best of London’s West End. Few other companies could dream of having such rich recorded archives to draw upon. Indeed, for many artists there is a clear tension in the very idea of recording work that is intended to be experienced live.

This essay reports on 20 in-depth interviews with performers and theatre-makers who had planned to bring shows to the 2020 Edinburgh Fringe Festival. This article reports on how performers responded to the prolonged closure of venues, and developed a series of strategies to generate value from recordings, even with limited production budgets. Crucially, very few opted to record whole live shows in empty theatres – instead they found specific uses and rationales for recording performance, while developing new expertise with sharing recorded media on digital platforms.

We argue that these digitally mediated performances are distinct from other forms of film or ‘live-to-digital’ theatre. Indeed, we suggest that this emerging genre of record will persist beyond the COVID-19 pandemic, and points to new opportunities in recording, broadcasting, and archiving performing arts as digital content.”

Citation for published version:
Elsden, C, Yu, D, Piccio, B, Helgason, I & Terras, M 2021, Recorded performance as digital content: Perspectives from Fringe 2020. in L Bissell & L Weir (eds), Performance in a Pandemic . 1 edn, Routledge, London. https://doi.org/10.4324/9781003165644

And here’s the author’s submitted copy of Recorded performance as digital content:
Perspectives from Fringe 2020
.

On Radio 3’s Between The Ears

I was delighted to be invited to contribute to a BBC Radio 3 programme, Between the Ears, with an episode called “The Virtual Symphony“, celebrating 30 years of the Internet, and the impact of it on our lives. I was interviewed for over an hour by the producer, Laurence Grissell, reflecting on my use of the internet and how it has impacted my professional and personal life, my memories of the early days in the physics and computer labs going online, and my thoughts on how it is changing society. Kieran Brunt, the composer, weaved four such interviews in with archive material, and new musical pieces, to produce a documentary that is also an artwork, showing how our relationship to and with the net is changing.

The Between the Ears logo, from BBC Radio 3.

First broadcast on Radio 3 on 18th July 2021 at 19.45, you can listen to it online, or here’s the MP3:

The official blurb goes like this! It would be great to hear what you think of it:

The joys and horrors of the internet, evoked by stories, sounds and an exciting new electronic and vocal work composed by Kieran Brunt. Opens with an introduction by the composer.

30 years ago, Tim Berners-Lee created the very first website. This powerful edition of Between the Ears explores how the internet has dramatically reshaped our lives over the following three decades.

In 1990s Glasgow, a young woman in a physics computer lab glimpses a different future for the world – and herself. In Luton, the web awakens a young man’s Sikh identity – a few years on, it will bring him riches. In 2001, a young mother in France finds escape through Wikipedia. Ten years later, an Austrian law student is horrified when he requests his personal data from Facebook…

Over four movements of music and personal stories, the Virtual Symphony moves from sunny optimism to deep disquiet, as our relationship to the internet shifts. Around these stories, composer Kieran Brunt weaves electronic and vocal elements in an exhilarating new musical work commissioned by BBC Radio 3.

Kieran Brunt and documentary producer Laurence Grissell worked in close collaboration to produce a unique evocation of the way in which the internet has fundamentally changed how we experience and understand the world.

Composer: Kieran Brunt

Producer: Laurence Grissell

Interviewees:

Melissa Terras, Harjit Lakhan, Florence Devouard and Max Schrems

Electronics performed by Kieran Brunt

Vocals performed by Kieran Brunt, Lucy Cronin, Kate Huggett, Oliver Martin-Smith and Augustus Perkins Ray of the vocal ensemble Shards

Programme mixed by: Donald MacDonald

Additional music production: Paul Corley

Additional engineering: Ben Andrewes

New Paper: Identifying the future direction of legal deposit in the United Kingdom: the Digital Library Futures approach

I’m delighted that a paper from the Digital Library Futures project has come out in the Journal of Documentation:

Gooding, P. , Terras, M. and Berube, L. (2021) Identifying the future direction of legal deposit in the United Kingdom: the Digital Library Futures approach. Journal of Documentation, (doi: 10.1108/JD-09-2020-0159)

Until this paper, there had been next to no research into how users are approaching and utilising the digital library collections now being amassed by our Legal Deposit (or colloquially known as “copyright libraries”) following the Legal Deposit Libraries (Non-Print Works) Regulations 2013, which enables and mandates them to collect digital copies of publications, as well as or instead of print. This paper addresses that gap by presenting key findings from the AHRC-funded Digital Library Futures project. Its purpose is to present a “user-centric” perspective on the potential future impact of the digital collections that are being created under electronic legal deposit regulations. Through our user study, we show that contemporary tensions between user behaviour and access protocols risk limiting the instrumental value of these digital library collections, which – although they have high perceived legacy value – are not being used in the way that they could, due to access and legal restrictions.

I’ve stuck the authors’ last copy up here, so you can read it if you can’t get beyond the paywall:

Gooding, P. , Terras, M. and Berube, L. (2021) Identifying the future direction of legal deposit in the United Kingdom: the Digital Library Futures approach (authors’ last copy, PDF).

Fully funded AHRC SGSAH CDA Studentship: “Slavery and Race in the Encyclopaedia Britannica (1768-1860): A Text Mining Approach”

I’m delighted to say I’ve been awarded a fully funded PhD studentship (open to international applicants!) with the National Library of Scotland, as a AHRC-funded Collaborative Doctoral Award, working with Professor Diana Paton (William Robertson Professor of History, University of Edinburgh), Dr Sarah Ames (Digital Scholarship Librarian, National Library of Scotland) and Robert Betteridge (Rare Books Curator (Eighteenth-Century Printed Collections), National Library of Scotland). Please do share this opportunity with recommended potential students, in History, Digital History, and/or Digital Humanities. An official advert will appear soon on UoE digital real estate, but I’m posting here first for expediency!

Fully funded AHRC SGSAH CDA Studentship: “Slavery and Race in the Encyclopaedia Britannica (1768-1860): A Text Mining Approach”

Application deadline – 5pm on Monday 17th May

Award – Annual stipend of £15,690 per year and tuition fees for 3.5 years (FTE). Open to Home and International students. (The successful candidate should reside within reasonable distance to the University of Edinburgh during the course of their studies).
PhD – English Literature

The University of Edinburgh and the National Library of Scotland are seeking a doctoral student for an AHRC-funded Collaborative Doctoral Award, “Slavery and Race in the Encyclopaedia Britannica (1768-1860): A Text Mining Approach”. The project has been awarded funding by the Scottish Graduate School for Arts and Humanities (SGSAH) and will be supervised by Professor Melissa Terras (College of Arts, Humanities and Social Sciences, University of Edinburgh), Professor Diana Paton (William Robertson Professor of History, University of Edinburgh), Dr Sarah Ames (Digital Scholarship Librarian, National Library of Scotland) and Robert Betteridge (Rare Books Curator (Eighteenth-Century Printed Collections), National Library of Scotland).

The studentship will commence on 13th September 2021. We warmly encourage applications from candidates who have a grounding in EITHER text and data mining/Digital Humanities, with proven knowledge and understanding of the history of slavery and/or race, OR UG/PG study of the history of slavery and/or race while demonstrating good technical skills and an interest in Digital Humanities/ Digital History methods. This is an extraordinary opportunity for a strong PhD student to explore their own research interests, while working closely with a major cultural heritage organisation, in important issues regarding the legacy of slavery in our information environment. 

The student will be based in the School of Literature, Languages and Cultures, at the George Square campus of the University of Edinburgh, but will also spend considerable time in the School of History, Classics and Archaeology at the University of Edinburgh, and at the National Library of Scotland. There will be a period of funded work placement at the National Library of Scotland, which will be co-determined with the student: for example, highlighting authors of articles relating to slavery and race in the Encyclopaedia Britannica, and exploring how these link to Library Collections in innovative ways.

The award will include a number of training opportunities offered by SGSAH, including their Core Leadership Programme and additional funding to cover travel between partner organisations and related events. This studentship will also benefit from training, support, and networking via the School of History, Classics and Archaeology the Edinburgh Centre for Data, Culture and Society, and the Edinburgh Futures Institute. The student will be invited to join National Library PhD cohort activities.

Project Details

“Slavery and Race in the Encyclopaedia Britannica (1768-1860): A Text Mining Approach”

How is the impact and outcomes of Atlantic slavery represented or alluded to in historical information sources? What is the legacy of slavery in our printed information environment? What text-mining approaches can be used to identify, analyse, and visualise these diverse and problematic histories? This research will use advanced digital approaches to understand how race and slavery feature in the Encyclopaedia Britannica (EB). The first eight editions of the EB, published 1768-1860, from the height of the UK’s involvement in the transatlantic slave trade, to the abolition of British slavery in 1838, and to ongoing subsequent debates about slavery and race, contains rich content related to Atlantic slavery and to forms of racialisation that developed from it. Utilising data from the newly digitised 143 volumes of the EB from the National Library of Scotland’s Data Foundry (comprising 167m words), this research will both provide insight into the explicit and implicit representation of slavery, the slave trade and race in this key reference material, but also develop a best-practice methodology for others wishing to use text mining to analyse race and slavery within other historical information sources.

This CDA will involve learning (well established) text and data mining approaches, applying them to the EB, involving unique corpus analysis that would need to consider the intellectual and cultural context in which eighteenth and nineteenth-century encyclopaedias were produced and published, and also linking and cross-referencing to other information sources available within the National Library of Scotland collection. By searching, analysing, and visualising the ways in which terms related to slavery appear in this essential reference material, using a variety of methods including GIS, accurate geoparsing, and following concepts and their relationships diachronically, we will both understand more about how Atlantic slavery was understood or instantiated within our information sources, whilst also developing a methodology for research into other similar primary reference material, and the ideas that they disseminated.

This is a timely topic, of significant relevance, given increasing interest in decolonising academic and cultural institutions. This project will have scholarly impact in Digital Humanities, History, and Library and Information Science, as we consider how to analyse, deconstruct and decolonialise historical information sources using computational methods, as well as contributing to discussions and policies at the National Library of Scotland on this topic.  

Eligibility

At the University of Edinburgh, to study at postgraduate level you must normally hold a degree in an appropriate subject, with an excellent or very good classification (equivalent to first or upper second class honours in the UK), plus meet the entry requirements for the specific degree programme.

In this case, applicants should offer a UK masters, or its international equivalent, with a mark of at least 65% in your dissertation of at least 10,000 words.

The AHRC also expects that applicants to PhD programmes will hold, or be studying towards, a Masters qualification in a relevant discipline; or have relevant professional experience to provide evidence of your ability to undertake independent research. Please ensure you provide details of your academic and professional experience in your application letter.

Experience in the study of the history of slavery and/or race, prior experience of digital tools and methods, an understanding of digitisation and the digitised cultural heritage environment, and use of quantitative research methods including text and data mining of historical sources, will be of benefit to the project.

The AHRC requires that students reside within a reasonable distance to their HEI as a condition of funding, although Covid disruption could be taken into account in the short term. 

Application Process

The application will consist of a single Word file or PDF which includes:

– a brief cover note that includes your full contact details together with the names and contact details of two referees (1 page).

– a letter explaining your interest in the studentship and outlining your qualifications for it, as well as an indication of the specific areas of the project you would like to develop (2 pages).

– a curriculum vitae (2 pages).

– a sample of your writing – this might be an academic essay or another example of your writing style and ability.

Applications should be emailed to pgawards@ed.ac.uk no later than 5pm on Monday 17th May. Applicants will be notified if they are being invited to interview by Tuesday 25th May. Interviews will take place week commencing Monday 31st May via an online video meeting platform.

Queries

If you have any queries about the application process, please contact: pgawards@ed.ac.uk

Informal enquiries relating to the Collaborative Doctoral Award project can be made to Professor Melissa Terras, m.terras@ed.ac.uk and Professor Diana Paton, Diana.Paton@ed.ac.uk

Further Information
How is the impact and outcomes of Atlantic slavery represented or alluded to in historical information sources? What is the legacy of slavery in our printed information environment? What text-mining approaches can be used to identify, analyse, and visualise these diverse and problematic histories? This research will use advanced digital approaches to understand how race and slavery feature in the Encyclopaedia Britannica (EB). The first eight editions of the EB, published 1768-1860, from the height of the UK’s involvement in the transatlantic slave trade, to the abolition of British slavery in 1838, and to ongoing subsequent debates about slavery and race, contains rich content related to Atlantic slavery and to forms of racialisation that developed from it. Utilising data from the newly digitised 143 volumes of the EB from the National Library of Scotland’s Data Foundry, this research will both provide insight into the explicit and implicit representation of slavery, the slave trade and race in this key reference material, but also develop a best-practice methodology for others wishing to use text mining to analyse race and slavery within other historical information sources.

The early EB was produced and published amidst the development of colonisation, globalisation and the transatlantic slave trade, and from its first edition it contained entries on slavery. Although the EB’s early success was facilitated by London book trading networks, it had distinctively Scottish roots, appealing to national sentiment.  In this context, examination of the early EB offers the possibility of discerning contemporary Scottish attitudes to slavery. The EB’s eventual popularity provides a useful case study concerning the representation and dissemination of ideas about slavery (and its abolition), but also the implicit legacies of the slave trade, such as the transmission of knowledge, culture, and products, as well as people. 

There is to date, a dearth of scholarship on the representation of chattel slavery in encyclopaedias. The limited studies that do exist amount to pieces of contextual evidence or small case studies that serve larger arguments. Much of the scholarship concerning the EB only examines it in terms of its publication history or epistemological approach. Studies of the early EB have omitted examination of change across particular entries across various editions. Investigation of the EB’s entry on slavery over time would in itself make a valuable historiographical addition. This doctoral project will go well beyond that, analysing the 167 million words contained in the 143 volumes of the first editions, using advanced Digital Humanities methods, particularly to look for implicit legacies of slavery, regarding products traded (eg cotton, sugar, tobacco, coffee), places mentioned (eg Haiti, Guyana, Saint Domingue, Calabar), individuals (eg Toussaint Louverture, William Wilberforce), or peoples (eg Igbo, Ashanti/Asante/Ashantee, Carib). 

Vincent Brown has argued that the nature of the slavery archive – riddled with gaps and silences – demands that historians move away from an approach that seeks straightforward ‘historical recovery’ to one that focusses on ‘rigorous and responsible creativity.’ (Vincent Brown, ‘Mapping a Slave Revolt: Visualizing Spatial History through the Archives of Slavery’, Social Text 33 (2015), p.134). There are existing, innovative digital humanities (DH) approaches to the study of slavery. Projects have used computational methods to explore large-scale corpora of slavery-related literature, examining the size of the English lexicon, the evolution of grammar and the frequency with which certain words or phrases were used over time, or in the study of emotions in narratives written by enslaved people. There is a broader range of DH projects that examine slavery in the Atlantic world, which have made novel historiographical contributions, perhaps most notably the broad databases Slave Voyages (https://www.slavevoyages.org/) and Legacies of British Slaveownership (https://www.ucl.ac.uk/lbs/), recently brought together with other projects as Enslaved (enslaved.org) but also the more focused Runaway Slaves in Britain (https://www.runaways.gla.ac.uk/) and the Early Caribbean Digital Archive (https://ecda.northeastern.edu/home/about/decolonizing-the-archive/). What we describe is the utilisation of (well established) text and data mining approaches, applied to the EB, involving unique corpus analysis that would need to consider the intellectual and cultural context in which eighteenth and nineteenth-century encyclopaedias were produced and published, and also linking and cross-referencing to other information sources available within the National Library of Scotland collection. By searching, analysing, and visualising the ways in which terms related to slavery appear in this essential reference material, using a variety of methods including GIS, accurate geoparsing, and following concepts and their relationships diachronically, we will both understand more about how Atlantic slavery was understood or instantiated within our information sources, whilst also developing a methodology for research into other similar primary reference material, and the ideas that they disseminated.

The University of Edinburgh is an ideal place to carry out this research. The Edinburgh Centre for Global History, which Paton directs, has Migration, Slavery and Diaspora studies as one of its three thematic hubs (https://www.ed.ac.uk/history-classics-archaeology/centre-global-history). The Centre for Data, Culture and Society’s recent push to establish text and data mining as a core research interest alongside training events and materials (https://www.cdcs.ed.ac.uk), aligned with support from the Edinburgh Parallel Computing Centre’s research software engineers (https://www.epcc.ed.ac.uk). We have already mounted the EB on EPCC systems, and ran preliminary searches on a selection of terms, as a pilot study to establish that there would be enough content upon which to build a PhD, in the analysis and visualisation of results. The candidate would be trained in both R and Python, and have access to our in-house text-mining at scale platform, Defoe (see “defoe: A Spark-based Toolbox for Analysing Digital Historical Textual Data”, Filgueira Vicente, R et al, 2019 https://doi.org/10.1109/eScience.2019.00033). 

This is a timely topic, of significant relevance, given the Black Lives Matter movement and increasing interest in decolonising academic and cultural institutions. The University of Edinburgh has recently established the Institute for Advanced Study in the Humanities Institute Project on Decoloniality (2021-24) (https://www.iash.ed.ac.uk/institute-project-decoloniality) and the candidate can engage with this. This project will have scholarly impact in Digital Humanities, History, and Library and Information Science, as we consider how to analyse, deconstruct and decolonialise historical information sources using computational methods.  

New article: The value of mass-digitised cultural heritage content in creative contexts

One of the projects I’m working on right now is Creative Informatics, (2018–2023), which aims to enhance data-sharing and innovation across the creative sectors throughout the City of Edinburgh and local regions, to develop ground-breaking new products, businesses and experiences, as part of the Creative Industries Clusters Programme (2020). I’m pleased to share our first team effort paper, which just came out in Big Data and Society, in its special edition on Heritage in a World of Big Data: re-thinking collecting practices, heritage values and activism, edited by Chiara Bonacchi (which is a fab set of papers, btw). Our paper is fully open access, so I’ll paste the abstract in here, and the full citation.

How can digitised assets of Galleries, Libraries, Archives and Museums be reused to unlock new value? What are the implications of viewing large-scale cultural heritage data as an economic resource, to build new products and services upon? Drawing upon valuation studies, we reflect on both the theory and practicalities of using mass-digitised heritage content as an economic driver, stressing the need to consider the complexity of commercial-based outcomes within the context of cultural and creative industries. However, we also problematise the act of considering such heritage content as a resource to be exploited for economic growth, in order to inform how we consider, develop, deliver and value mass-digitisation. Our research will be of interest to those wishing to understand a rapidly changing research and innovation landscape, those considering how to engage memory institutions in data-driven activities and those critically evaluating years of mass-digitisation across the heritage sector.

Terras, M., Coleman, S., Drost, S., Elsden, C., Helgason, I., Lechelt, S., Osborne, N., Panneels, I., Pegado, B., Schafer, B. and Smyth, M., 2021. The value of mass-digitised cultural heritage content in creative contextsBig Data & Society8(1), p.20539517211006165.

It’s worth stressing that we problematise the act of considering such heritage content as a resource to be exploited for economic growth before people set the pitchforks upon us.

It was a great paper to write with the team, and I can recommend working with the BD&S editors and peer reviewers – this one had a few turns around the block, and it is all the better for it.

#DHGoesViral, a year on

I haven’t talked much about on here about the pandemic. A year ago today, the #DHgoesViral twitter conference happened, swiftly organised by Agiati Benardou at the outbreak of Covid-19 across Europe. By then we were a few weeks into a rapid change in how we were all living, and locked down at home with minimal contact with the outside world. Only a few weeks before – and the day before the UK lockdown started – I remember talking to a senior administrator, who was convinced universities wouldn’t close. We were closed down 24 hours later. Everything was stress and uncertainty and a huge cognitive load to deal with.

DH in the time of Virus played out entirely over twitter. It saw Digital Humanities experts, both academics and practitioners, as well as Digital Research Infrastructures and Initiatives from across Europe, give their thoughts on what was happening to our field and our professional areas at the time of the sudden lockdowns. I was asked to give mine, and honestly, finding the mental ability to concentrate on preparing these 10 tweets was hard, it took me nearly a day (when in normal life I could bash this out in 10 mins or so, although what is normal anymore….?). I thought I would park them here, to think about what has changed – and what is the same – at the end of our second lockdown in the UK, and as central Europe goes into its third.

You can see the starting point for the other #DHgoesViral twitter stream “talks” on this blog. Here was mine. I can see now we’re not so panicked, but still restricted. We’re still depending on infrastructures that are under resourced. There are still loads of people doing a tonne of work behind the scenes. And we’re dependent on digital given the libraries and archives are (at the moment) still closed…

Look after yourselves, everyone.

New Paper: Text mining Mill: Computationally detecting influence in the writings of John Stuart Mill from library records

Image of handwritten library register
London Library Issue Book No. 3 showing John Stuart Mill’s intensive borrowing record during 1845, London Library Issue Book Number 3, p. 529. The horizontal lines indicate the return of individual books. The vertical lines indicate that all the books listed on the page have been returned. Image reproduced with the kind permission of the London Library. © The London Library

How can computational methods illuminate the relationship between a leading intellectual, and their lifetime library membership? I’m pleased to say that a paper, derived primarily from the work Dr Helen O’Neill conducted for her PhD thesis in Information Studies at UCL, on The Role of Data Analytics in Assessing Historical Library Impact: The Victorian Intelligentsia and the London Library (2019), supervised by myself and Anne Welsh, has just been published. The interesting thing about this paper is that it started life as a tweet:

Replies from both David A. Smith (at Northeastern), and Glenn Roe (now at the Sorbonne), who took the time to detail and explain their previous work in detecting textual reuse, led to a collaboration. In O’Neill’s doctoral work, we explored the interrelation between the reading record and the publications of the British philosopher and economist John Stuart Mill, focusing on his relationship with the London Library, an independent lending library of which Mill was a member for 32 years.

Building on O’Neill’s detailed archival research of the London Library’s lending and book donation records, O’Neill constructed a corpora of texts from the Internet Archive, of the (exact editions) of the books Mill borrowed and donated, and publications he produced. This enabled natural language processing approaches to detect textual reuse and similarity, establishing the relationship between Mill and the Library. With Smith and Roe’s assistance, we used two different methods, TextPAIR and Passim, to detect and aligning similar passages in the books Mill borrowed and donated from the London Library against his published outputs.

So what did we show? The collections of the London Library influenced Mill’s thought, transferred into his published oeuvre, and featured in his role as political commentator and public moralist. O’Neill’s work had already uncovered how important the London Library was to Mill, and how often he used it, but we can also see that in the texts he wrote, given the volume of references to material in the London Library, particularly around certain times, and publications.

The important thing about this really is that we have re-conceived archival library issue registers as data for triangulating against the growing body of digitized historical texts and the output of leading intellectual figures (historical turn-it-in, huh). This approach, though, is dependent on the resources and permissions to transcribe extant library registers, and on access to previously digitized sources. Because of complexities in privacy regulations, and the limitations placed on digitisation due to copyright, this is most likely to succeed for other leading eighteenth- and nineteenth-century figures. Still cool, though.

On a personal note – this is the last paper I’ll be publishing from work that I started while employed at UCL. It was important to me to see through the PhD supervisions I had committed to, and I was delighted when Helen O’Neill (who did her PhD part time, while working full time!) passed her viva with no corrections at all (yay!) in 2019. Happy end to an era, for me, and yes, it has taken a full three years to finish up the transition of research to Edinburgh! But all good.

Here’s the reference to the paper proper:

O’Neill, H., Welsh, A., Smith, D.A., Roe, G. and Terras, M., 2021. Text mining Mill: Computationally detecting influence in the writings of John Stuart Mill from library recordsDigital Scholarship in the Humanities. https://doi.org/10.1093/llc/fqab010

I’ve parked an open access version for you to read without the paywall. Enjoy!

New paper: Digital cultural colonialism: measuring bias in aggregated digitized content held in Google Arts and Culture

Map showing the dominance of content with origins in the USA, in Google Arts and Culture.

In February 2011, Google launched its Google Art Project, now known as Google Arts and Culture (GA&C), with an objective to make culture more accessible. The platform (and the content on its app) has dramatically grown since then, and currently hosts approximately six million high-resolution images of artworks from approximately 2,500 museums and galleries in almost every country featured in the UN member list. Although Google does not publish user statistics for the resource, it is understood that virtual visitor numbers increased dramatically during 2020, when many leading arts institutions had to close their doors because of the COVID-19 pandemic. There has, to date, been very little research published on the platform, and our newly published research in the Journal of Digital Scholarship in the Humanities (which was a Digital Humanities class project with Inna Kizhner‘s students at Siberian Federal University) interrogates GA&C to understand its scope, scale, and coverage.

Our scraping of the site content (in summer 2019) shows that Google Arts and Culture is far from a balanced representation of the world’s cultures: 

  • The major proportion of the holdings feature content that resides in the USA;
  • Only 7.5% of the content is from institutions beyond the USA, UK, Netherlands and Italy; 
  • There are very few African cultural institutions who have contributed to the platform, and very little African culture present;
  • The culture of some countries (such as Kazakhstan) are represented entirely by pictures of American and Russian culture via the space programme;
  • Artworks from capital cities dominate the collections, while art from provinces is underrepresented;
  • There is a dominance of art from the 20th century.

Pie chart showing that 82% of individual items in Google Arts and Culture are presented by USA institutions.

This leads to some extreme examples in the platform. There are next to no entries in GA&C featuring content originating in large parts of the African continent. American culture dominates Canadian culture. The culture of Kazakhstan (in 2019) was represented entirely in GA&C by 4000 pictures of American and Russian astronauts, from the NASA archives: no Kazakhstani institution had uploaded content themselves. 

In 2016, Amit Sood, the Director of GA&C, 2016 claimed that it would introduce a new era of accessible art. However, we have shown that GA&C is a corpus where a number of cultures are underrepresented or marginalized. It maintains dominant cultural systems, including promoting the cultural holdings of the United States of America above all others. Google Arts and Culture is therefore an example of “digital cultural colonialism”, which amplifies the cultural holdings of one particular country, and also reinforces the conventional traditions of art collection and interpretation that dominate museum displays in larger Western cities.


The biases we have discovered in GA&C may have long-ranging affects. If larger quantities of objects, images, and stories related to a particular idea or representation of selected knowledge are present in aggregators of cultural content, these ideas and concepts will be promoted, accessed, disseminated, and studied, becoming the foundation of the new digital canon: one that can be appropriated for Artificial Intelligence and machine learning. 

We argue that the choices that have gone into the content feature on GA&C should be made clear, as well as the biases contained within the platform. Our research is the first step to understand that this major platform is not neutral, and contains biases that will impact on its large user base, as well as data-led approaches that draw upon it, both now, and in the future. We end with a challenge: what will GA&C do to make its processes for ingesting and showcasing ‘arts and culture’ transparent, and how will it deploy its resources to expand the reach and spread of the digital content it features?

For more, see: “Digital cultural colonialism: measuring bias in aggregated digitized content held in Google Arts and Culture” by Inna Kizhner,  Melissa Terras,  Maxim Rumyantsev,  Valentina Khokhlova, Elisaveta Demeshkova,  Ivan Rudov,  Julia Afanasieva. Digital Scholarship in the Humanities, https://doi.org/10.1093/llc/fqaa055

Given that the copyright in DSH belongs to the authors, I’ve placed a copy of the journal article here on this blog, for easy and open access.

We are following up this research on the platform with a follow up study to understand the GLAM sector’s views on Google Arts and Culture, including the process of becoming a partner in the platform. If you would like to give your anonymous opinion, our online survey is open until 15th February 2021: we would appreciate all insights.