#DHGoesViral, a year on

I haven’t talked much about on here about the pandemic. A year ago today, the #DHgoesViral twitter conference happened, swiftly organised by Agiati Benardou at the outbreak of Covid-19 across Europe. By then we were a few weeks into a rapid change in how we were all living, and locked down at home with minimal contact with the outside world. Only a few weeks before – and the day before the UK lockdown started – I remember talking to a senior administrator, who was convinced universities wouldn’t close. We were closed down 24 hours later. Everything was stress and uncertainty and a huge cognitive load to deal with.

DH in the time of Virus played out entirely over twitter. It saw Digital Humanities experts, both academics and practitioners, as well as Digital Research Infrastructures and Initiatives from across Europe, give their thoughts on what was happening to our field and our professional areas at the time of the sudden lockdowns. I was asked to give mine, and honestly, finding the mental ability to concentrate on preparing these 10 tweets was hard, it took me nearly a day (when in normal life I could bash this out in 10 mins or so, although what is normal anymore….?). I thought I would park them here, to think about what has changed – and what is the same – at the end of our second lockdown in the UK, and as central Europe goes into its third.

You can see the starting point for the other #DHgoesViral twitter stream “talks” on this blog. Here was mine. I can see now we’re not so panicked, but still restricted. We’re still depending on infrastructures that are under resourced. There are still loads of people doing a tonne of work behind the scenes. And we’re dependent on digital given the libraries and archives are (at the moment) still closed…

Look after yourselves, everyone.

New Paper: Text mining Mill: Computationally detecting influence in the writings of John Stuart Mill from library records

Image of handwritten library register
London Library Issue Book No. 3 showing John Stuart Mill’s intensive borrowing record during 1845, London Library Issue Book Number 3, p. 529. The horizontal lines indicate the return of individual books. The vertical lines indicate that all the books listed on the page have been returned. Image reproduced with the kind permission of the London Library. © The London Library

How can computational methods illuminate the relationship between a leading intellectual, and their lifetime library membership? I’m pleased to say that a paper, derived primarily from the work Dr Helen O’Neill conducted for her PhD thesis in Information Studies at UCL, on The Role of Data Analytics in Assessing Historical Library Impact: The Victorian Intelligentsia and the London Library (2019), supervised by myself and Anne Welsh, has just been published. The interesting thing about this paper is that it started life as a tweet:

Replies from both David A. Smith (at Northeastern), and Glenn Roe (now at the Sorbonne), who took the time to detail and explain their previous work in detecting textual reuse, led to a collaboration. In O’Neill’s doctoral work, we explored the interrelation between the reading record and the publications of the British philosopher and economist John Stuart Mill, focusing on his relationship with the London Library, an independent lending library of which Mill was a member for 32 years.

Building on O’Neill’s detailed archival research of the London Library’s lending and book donation records, O’Neill constructed a corpora of texts from the Internet Archive, of the (exact editions) of the books Mill borrowed and donated, and publications he produced. This enabled natural language processing approaches to detect textual reuse and similarity, establishing the relationship between Mill and the Library. With Smith and Roe’s assistance, we used two different methods, TextPAIR and Passim, to detect and aligning similar passages in the books Mill borrowed and donated from the London Library against his published outputs.

So what did we show? The collections of the London Library influenced Mill’s thought, transferred into his published oeuvre, and featured in his role as political commentator and public moralist. O’Neill’s work had already uncovered how important the London Library was to Mill, and how often he used it, but we can also see that in the texts he wrote, given the volume of references to material in the London Library, particularly around certain times, and publications.

The important thing about this really is that we have re-conceived archival library issue registers as data for triangulating against the growing body of digitized historical texts and the output of leading intellectual figures (historical turn-it-in, huh). This approach, though, is dependent on the resources and permissions to transcribe extant library registers, and on access to previously digitized sources. Because of complexities in privacy regulations, and the limitations placed on digitisation due to copyright, this is most likely to succeed for other leading eighteenth- and nineteenth-century figures. Still cool, though.

On a personal note – this is the last paper I’ll be publishing from work that I started while employed at UCL. It was important to me to see through the PhD supervisions I had committed to, and I was delighted when Helen O’Neill (who did her PhD part time, while working full time!) passed her viva with no corrections at all (yay!) in 2019. Happy end to an era, for me, and yes, it has taken a full three years to finish up the transition of research to Edinburgh! But all good.

Here’s the reference to the paper proper:

O’Neill, H., Welsh, A., Smith, D.A., Roe, G. and Terras, M., 2021. Text mining Mill: Computationally detecting influence in the writings of John Stuart Mill from library recordsDigital Scholarship in the Humanities. https://doi.org/10.1093/llc/fqab010

I’ve parked an open access version for you to read without the paywall. Enjoy!

New paper: Digital cultural colonialism: measuring bias in aggregated digitized content held in Google Arts and Culture

Map showing the dominance of content with origins in the USA, in Google Arts and Culture.

In February 2011, Google launched its Google Art Project, now known as Google Arts and Culture (GA&C), with an objective to make culture more accessible. The platform (and the content on its app) has dramatically grown since then, and currently hosts approximately six million high-resolution images of artworks from approximately 2,500 museums and galleries in almost every country featured in the UN member list. Although Google does not publish user statistics for the resource, it is understood that virtual visitor numbers increased dramatically during 2020, when many leading arts institutions had to close their doors because of the COVID-19 pandemic. There has, to date, been very little research published on the platform, and our newly published research in the Journal of Digital Scholarship in the Humanities (which was a Digital Humanities class project with Inna Kizhner‘s students at Siberian Federal University) interrogates GA&C to understand its scope, scale, and coverage.

Our scraping of the site content (in summer 2019) shows that Google Arts and Culture is far from a balanced representation of the world’s cultures: 

  • The major proportion of the holdings feature content that resides in the USA;
  • Only 7.5% of the content is from institutions beyond the USA, UK, Netherlands and Italy; 
  • There are very few African cultural institutions who have contributed to the platform, and very little African culture present;
  • The culture of some countries (such as Kazakhstan) are represented entirely by pictures of American and Russian culture via the space programme;
  • Artworks from capital cities dominate the collections, while art from provinces is underrepresented;
  • There is a dominance of art from the 20th century.

Pie chart showing that 82% of individual items in Google Arts and Culture are presented by USA institutions.

This leads to some extreme examples in the platform. There are next to no entries in GA&C featuring content originating in large parts of the African continent. American culture dominates Canadian culture. The culture of Kazakhstan (in 2019) was represented entirely in GA&C by 4000 pictures of American and Russian astronauts, from the NASA archives: no Kazakhstani institution had uploaded content themselves. 

In 2016, Amit Sood, the Director of GA&C, 2016 claimed that it would introduce a new era of accessible art. However, we have shown that GA&C is a corpus where a number of cultures are underrepresented or marginalized. It maintains dominant cultural systems, including promoting the cultural holdings of the United States of America above all others. Google Arts and Culture is therefore an example of “digital cultural colonialism”, which amplifies the cultural holdings of one particular country, and also reinforces the conventional traditions of art collection and interpretation that dominate museum displays in larger Western cities.

The biases we have discovered in GA&C may have long-ranging affects. If larger quantities of objects, images, and stories related to a particular idea or representation of selected knowledge are present in aggregators of cultural content, these ideas and concepts will be promoted, accessed, disseminated, and studied, becoming the foundation of the new digital canon: one that can be appropriated for Artificial Intelligence and machine learning. 

We argue that the choices that have gone into the content feature on GA&C should be made clear, as well as the biases contained within the platform. Our research is the first step to understand that this major platform is not neutral, and contains biases that will impact on its large user base, as well as data-led approaches that draw upon it, both now, and in the future. We end with a challenge: what will GA&C do to make its processes for ingesting and showcasing ‘arts and culture’ transparent, and how will it deploy its resources to expand the reach and spread of the digital content it features?

For more, see: “Digital cultural colonialism: measuring bias in aggregated digitized content held in Google Arts and Culture” by Inna Kizhner,  Melissa Terras,  Maxim Rumyantsev,  Valentina Khokhlova, Elisaveta Demeshkova,  Ivan Rudov,  Julia Afanasieva. Digital Scholarship in the Humanities, https://doi.org/10.1093/llc/fqaa055

Given that the copyright in DSH belongs to the authors, I’ve placed a copy of the journal article here on this blog, for easy and open access.

We are following up this research on the platform with a follow up study to understand the GLAM sector’s views on Google Arts and Culture, including the process of becoming a partner in the platform. If you would like to give your anonymous opinion, our online survey is open until 15th February 2021: we would appreciate all insights.

Arise Sir Generative: When AI Met the Queen

Example of one of Rudolf Ammann’s ImprovBot illustrations CC-BY-NC if you want to reuse…

Over the past few months, I’ve had a lot of fun with generative AI. Last summer, I put on ImprovBot (with my colleagues Rudolf Ammann and Gavin Inglis), which was the world’s first AI-generated Arts Festival Programme. Taking  2.5 million words of material from the past 8 years of Edinburgh Festival Fringe Society listings, we trained our neural network, The Bot, to generate “new” show blurbs, which we put our hourly over the expected period of the Fringe (which couldn’t happen in person last year), with custom generated imagery, and a live improv show every day from the Improverts, the Edinburgh University student improv society. We got fantastic coverage worldwide, include a 4 Star review from The Stage! – and the whole thing was a month long elegy for a Fringe, and its associated industries – that were decimated this year. We hoped to walk the line of poignant, fun, creative, and everything-being-mediated-by-tech these days. Incidentally, you can see all the outputs of the ImprovBot text over at Zenodo, and if you are after any art work to represent AI, we’ve licensed reuse of 376 ImprovBot illustrations CC-BY-NC: fill your boots with ones like the one above. An easy to download selection, also available under a less restrictive licence that allows for commercial use and does not require attribution, is available over at Pixabay.

Just before Christmas, the Alan Turing Institute (of which I am a Fellow) was asked by Wired Magazine if there was anyone who would like to explore generating a Christmas Speech for the Queen, using AI? Why yes! David Beavan and I were delighted to hack together a response, which went out on Christmas Eve:

To train the system, he had to combine the two datasets, one of the Queen’s previous broadcasts and the second of WIRED Covid-19 stories, into a single document to ensure both were equally considered. “You give it [GPT-2 ]the beginning of a sentence and the idea is it guesses the next word,” he says. After examining the results, the temperature is dialled up or down.

The system churned out thousands of words, which were then passed to Terras to edit down. She started by taking out anything negative or controversial – “the computer put together some dark stuff,” Terras says, especially around race, the commonwealth and war – and then selected relevant passages, keeping the sentences whole but altering order and placement. Some AI systems can analyse documents for structure, but not this one, so a helping hand was required. “I took a box of tiles and put them in a mosaic,” she explains. “There’s a lot of human editing.”

You can read the whole article over on Wired, and I’ll post our generated Queen’s Christmas Speech below. It was a lot of fun – but also raises issues of ethics, the amount of human interaction that is needed for the “softer” things (humour! power structures! respect!) and also the role of these low-hanging-fruit, quick win, playful things in the public engagement with AI and algorithmically generated content. Both of these projects weren’t technically doing anything new, but by pointing the power of generative text to a new, playful application – well, the results help us consider AI afresh, in a way which is explainable to others.

Enjoy the Windsor-o-Tron‘s output!

Christmas is a time for reflection on the past and making new friends. On the first day of the year, however, things began to look a bit more grim. I remember meeting Joseph and Mary at the Inn in Sandringham. We were both looking forward to the future and looking forward to our visit to Oxford this autumn. I shall never forget the scene in Windsor, where the Covid-19 outbreak was reignited. In the first lockdown, all tourists were restored to normal, adults were ordered to stay at home and children under five were allowed to stay at home.

I have spent the last couple of weeks listening to some of your radio and television interviews, which has touched me deeply. I have thought to myself whether it is time to send you my best wishes for Christmas and the New Year. The NHS has faced a real and growing challenge in the years ahead. It’s been a difficult few months for many people living alone. But with so much to build on and many exciting opportunities to be found in the nature of our relationship, this year I think it is safe to say that we are all looking forward to a new year.

We are also living in a time of social distancing: the less we live together the more distanced we become. I am thinking of those now living with their parents or caring for them at home. These people are now their families. That motherly instinct has helped to shape my own views of the world, thoughts on life and my own beliefs. I remember the first time I was asked by a kindly visitor, a man of few words, what year was Jamaica.

The world has to face its challenges and confront its problems with courage, patience and fortitude. A vaccine for Covid-19 hinges on the delivery of a drug, so that new antibodies can be triggered. But the real power lies in the invisible hand that draws the world in. When invisible hands come to the task, it’s often the invisible workers at the machinery who are making that change. It is through their example and willingness to show the world that they deserve our respect that we can make a real difference.

One of the things that has remained constant throughout the Commonwealth, I believe, is the effort to reconcile the differences between nations and between countries. That spirit of brotherhood which has survived the most serious challenge of the present century can be best expressed in the British Commonwealth and the Commonwealth international formula. Every year I look forward to opening the letters, parcels and telegrams that come to me from the Commonwealth. I can think of no better time than now to say a big thank you to all the people who have given so much to this country and all around the Commonwealth. Every one of them has given so much to me.

This year I’ve spent a great deal of time and effort in various fashions and colours, some of which are familiar to many of you. Naturally I would like to draw attention to the fact that my family and myself have enjoyed a very happy and prosperous past year. We are fortunate to have a home and some children.

Like many other families, we gathered to watch the bubbling fountains of humanity rise above the evil. In the meantime, members of my own family are celebrating Christmas with their families and we shall see further developments as I set out to see which side of the Atlantic the peace will be in the coming year.

In January 2021, after we’ve all lined up patiently for our jabs and the threat of the virus has receded, we may finally start to count the damage the novel coronavirus has wrought on our lives. The Prince of Wales also saw first hand the remarkable resilience of the human spirit. Yes, there are many of you unhappy families, but there are also millions of ordinary people who are helping keep our country and our Commonwealth together. They are making a real contribution to our society. There may be small signs of recovery, but in the meantime, we must all keep an eye out for signs of a slowing or complete return to the days when King James was a political and economically powerful man. 

The real value of Christmas lies in the message and the spirit that it brings. Christmas is a very human offering, and it speaks to the needs of all people. So, as it passes through our thoughts are diverted to other planets, and to the struggles beyond our control. 

The Christmas story reminds us that it is not only about one man, but about many. We have a message for you all: hope, peace, brotherhood and a happy Christmas. Whether you are talking to a friend, or a relative, or a stranger, or a visitor from another world, the message of Christmas is ever more relevant than ever. I would like to see a message of encouragement, as I go about my business in the rain.

Our lives are shaped by our past, and as we live out our future together we should know each other best. It is difficult for us to know far into the future as our families gather round us, but it is better that we have some sense than that we have any sense at all. I wish you all, together with your children and grandchildren, a blessed Christmas.

New Book Out Now! Electronic Legal Deposit: Shaping the Library Collections of the Future

I’m delighted that my latest edited collection, with Paul Gooding, is out now with Facet Publishing: Electronic Legal Deposit: Shaping the Library Collections of the Future. Stemming from our Digital Library Futures AHRC funded project, which looked at the  impact of changes to electronic legal deposit legislation upon UK academic deposit libraries and their users, we’ve pulled together this collection from contributing experts worldwide to look at issues and successes in preserving our digital publishing environment.

For those who don’t know what electronic legal deposit legislation is, lets back up a bit. It is of course related to legal deposit, and as we say in the introduction:

Legal deposit is the regulatory requirement that a person or group submit copies of their publications to a trusted repository. First introduced by France in the sixteenth century… legal deposit has since been adopted around the world: as of 2016, 62 out of 245 national and state libraries worldwide either benefited from legal deposit regulations or participated in legal deposit activities… Regulations permitting legal deposit of printed publications have played a vital role in supporting libraries to build comprehensive national collections for the public good… In the last two decades, the scope of legal deposit has grown to formally incorporate ‘electronic’ or ‘non-print’ publication; those published in digital and other non-print formats. (Gooding and Terras 2020, p.xxiv).

We believe that this is the first book to attempt to draw together an overview of contemporary activities in major organisations and institutions trying to preserve our digital publishing world, which of course includes the world wide web, and how it is archived. We do so from a user perspective, looking at the implications this will have from users of the collections, both now and in the future. And we poke a big stick at the intersection of copyright and legal deposit legislation which often conspire to make user access so limited and tricky to negotiate that end users are presented with a series of obstacles to even get basic access to electronic legal deposit content. You can find a break down of the chapters and contributors here, including those from the National Library of Sweden, Biblioteca Nacional de México, National Archives of Zimbabwe, etc etc!

We’ll be holding a book launch on 5th November 2020, online, for those who want to hear some excellent speakers on the topic, including from the National Library of Scotland, and Universidad Nacional Autónoma de México. And I’m particularly taken with the cover of this one, which is an art work created from an actual LiDar scan of the National Library of Scotland stacks, by Edinburgh College of Art PhD student Asad Khan. I love it when a plan comes together.

For those who want a sneak peek of the content, under Facet’s Green Open Access rules, I’m allowed to share the author’s last copy of a single chapter from an edited collection. So here, from Paul and I, is our chapter on how the digital turn has affected legal deposit legislation, showing that ” print era notions that influence the NPLD access and reuse regulations are increasingly out of step with broader developments in publishing, information technology, and broader socio-political trends in access to information”. Have at it, and enjoy.

Gooding, Paul and Terras, Melissa (2020). An Ark to Save Learning from Deluge’? Reconceptualising Legal Deposit after the Digital Turn. In Gooding, Paul and Terras, Melissa (2020) (Eds).  Electronic Legal Deposit: Shaping the Library Collections of the Future. Facet: London, 203-228.

New paper: Understanding multispectral imaging of cultural heritage: Determining best practice in MSI analysis of historical artefacts

What do people actually do when they undertake multispectral imaging of cultural heritage? I’m really pleased that our latest paper has been published, that helps set out the answer to this question, and provides a literature review on heritage digitisation that has been using multispectral imaging, comparing and contrasting methods. This formed part of Dr Cerys Jones’ PhD research, and I was really delighted to supervise this with Adam Gibson and Christina Duffy:

Jones, C, Terras, M, Duffy, C & Gibson, A 2020 “Understanding multispectral imaging of cultural heritage: Determining best practice in MSI analysis of historical artefacts”. Journal of Cultural Heritage.

You can see the journal version online here – but the link above will take you to the authors’ submitted copy.  Enjoy!

Fully Funded AHRC Studentship: “Adopting Transkribus in the National Library of Scotland: Understanding how Handwritten Text Recognition Will Change Management and Use of Digitised Manuscripts”

I’m pleased to say that we’ve won a Scottish Graduate School for Arts and Humanities (SGSAH) 3.5 year scholarship for a PhD student, looking at how we can embed Handwritten Text Written software into digitisation practices, whilst supporting users, working with Transkribus and the National Library of Scotland. The advert will go live soon on our official channels – but for now – here are the details and I’d appreciate folks sharing with any interested EU or UK Master’s students! Closing date of 22nd June. Thank you!

The University of Edinburgh, the National Library of Scotland, and the University of Glasgow, in conjunction with the READ-COOP, are seeking a doctoral student for an AHRC-funded Collaborative Doctoral Award, “Adopting Transkribus in the National Library of Scotland: Understanding How Handwritten Text Recognition Will Change Management and Use of Digitised Manuscripts”. The project has been awarded funding by the Scottish Graduate School for Arts and Humanities (SGSAH) and will be supervised by Professor Melissa Terras (College of Arts, Humanities and Social Sciences, University of Edinburgh), Dr Paul Gooding (Lecturer in Information Studies, University of Glasgow), Dr Sarah Ames (Digital Scholarship Librarian, National Library of Scotland) and Stephen Rigden (Digital Archivist, National Library of Scotland).

The studentship will commence on 14th September 2020. We warmly encourage applications from candidates with a background in digital humanities, information studies, library science, user experience and human computer interaction, history, manuscript studies, and/or palaeography. This is an extraordinary opportunity for a strong PhD student to explore their own research interests, while working closely with a major cultural heritage organisation, two world-leading universities, and the team behind Transkribus (https://transkribus.eu/Transkribus/), the machine learning platform for generating transcripts of historical manuscripts via Handwritten Text Recognition.

The student will be based in the School of Literature, Languages and Cultures, at the George Square campus of the University of Edinburgh, but will also spend considerable time at the National Library of Scotland, and liaising with the Transkribus team (based at the University of Innsbruck). Much of the research can be undertaken offsite.

The student stipend is approximately £15,285 per annum + tuition fees for 3.5 years. The award will include a number of training opportunities offered by SGSAH, including their Core Leadership Programme and additional funding to cover travel between partner organisations and related events. This studentship will also benefit from training, support, and networking via the Edinburgh Centre for Data, Culture and Society, and the Edinburgh Futures Institute. The student will be invited to join National Library PhD cohort activities.

Project Details

“Adopting Transkribus in the National Library of Scotland: Understanding how Handwritten Text Recognition Will Change Management and Use of Digitised Manuscripts”

Libraries are investing in mass digitisation of manuscript collections but until recently textual content has only been available to those who have the resources for manual transcription of digital images. This project will study institutional reception to machine-learning processes to transcribe handwritten texts at scale. The use of Handwritten Text Recognition (HTR) to generate transcripts from digitised historical texts with machine learning approaches will transform access to researchers, institutions and the general public.

The PhD candidate will work with the National Library of Scotland and its user community to gain a holistic view of how HTR is changing access to the text contained within digitised images of manuscripts, from both an institutional and user context, at a time when the Library is scaling up its own mass digitisation practices. A student placement at the National Library of Scotland will link this to wider research questions. The candidate will learn how to use Transkribus at an expert level, work closely with the digital team at the National Library of Scotland to understand how best to apply HTR within a heritage digitisation context, and investigate how best to encourage and support the uptake of this technology with users of digitised content. This will result in a holistic, user focused analysis of the current provision of HTR, while also assisting the National Library of Scotland and other cultural heritage institutions in being able to understand how best to deploy this new technology in an effective manner, understanding implications for themselves, and their users, as well as contributing to the growth of the only freely available HTR solution currently available for the heritage community.

This CDA therefore gives unique access to a rapidly growing community, and tool for historical research, which has not yet been studied from a user or institutional perspective. The outputs of this research will be of use to both the National Library of Scotland, other institutions using HTR, those considering this approach, and the READ-COOP, who manage Transkribus.


At the University of Edinburgh, to study at postgraduate level you must normally hold a degree in an appropriate subject, with an excellent or very good classification (equivalent to first or upper second class honours in the UK), plus meet the entry requirements for the specific degree programme (https://www.ed.ac.uk/studying/postgraduate/degrees/index.php?r=site/view&edition=2020&id=254). In this case, applicants should offer a UK masters, or its international equivalent, with a mark of at least 65% in your dissertation of at least 10,000 words.

To be eligible to apply for the studentship you must meet the residency criteria set out by UKRI. For further details please see the UKRI Training Grant Guide document, p17.

The AHRC also expects that applicants to PhD programmes will hold, or be studying towards, a Masters qualification in a relevant discipline; or have relevant professional experience to provide evidence of your ability to undertake independent research. Please ensure you provide details of your academic and professional experience in your application letter.

Prior experience of digital tools and methods, an understanding of digitisation and the digitised cultural heritage environment, use of qualitative and quantitative research methods, and an experience of palaeography, history, or interest in historical manuscript material will be of benefit to the project. However, this is not a prerequisite so while preference may be given to candidates with prior experience in these areas, others are warmly encouraged to apply.

Application Process

The application will consist of a single Word file or PDF which includes:

  1. a brief cover note that includes your full contact details together with the names and contact details of two referees (1 page).
  2. a letter explaining your interest in the studentship and outlining your qualifications for it, as well as an indication of the specific areas of the project you would like to develop (2 pages).
  3. a curriculum vitae (2 pages).
  4. a sample of your writing – this might be an academic essay or another example of your writing style and ability.

Applications should be emailed to pgawards@ed.ac.uk no later than 5pm on Monday 22nd June. Applicants will be notified if they are being invited to interview by Thursday 2nd July. Interviews will take place on Thursday 16th July via an online video meeting platform.

Further information

If you have any queries about the application process, please contact: pgawards@ed.ac.uk.   Informal enquiries relating to the Collaborative Doctoral Award project can be made to Professor Melissa Terras.

More Info:

Libraries and archives are investing in digitisation of manuscript collections at scale, but until recently transcriptions of digitised texts have only been available to those with the resources to manually transcribe individual passages. AI is now used within archives for a growing range of tasks: tagging of large image sets; detecting specific content types in digitised newspapers; discovering archival materials; and supporting appraisal, selection and sensitivity review. Successful machine learning approaches to transcribing images of historical papers by Handwritten Text Recognition (HTR) will transform access to our written past for the use of researchers, institutions and the general public.

This project will explore how Handwritten Text Recognition (HTR) can be embedded into digitisation workflows in a way that best benefits an institution’s users. Transkribus (https://transkribus.eu/Transkribus/, currently the only non-commercial HTR platform capable of generating transcriptions of up to 98% accuracy, will be used as the research foundation. Transkribus is the result of eight years of EU funded research into the automatic generation of transcripts from digitised images of historical text through the application of machine learning. There are now 25,000 Transkribus users, including individuals and major libraries, archives and museums worldwide (https://read.transkribus.eu/network/). Recently, a not for profit foundation (READ-COOP https://read.transkribus.eu/about/coop/) has been established to ensure that the Transkribus software will be sustained. While recent publications have considered HTR from the perspective of platform development (Muehlberger et al., 2019 https://www.emerald.com/insight/content/doi/10.1108/JD-07-2018-0114/full/html), there has been no research published to date on how user communities are using HTR, the effect this will have on scholarly workflows, and the potential HTR has for institutions.

The project will partner with the National Library of Scotland, with support from its Digital and Archives and Manuscript Divisions. This will enable the student to pursue relevant areas of interest, such as:

  • Experience and analyse the staged processes of digitisation workflows in context;
  • Apply an understanding of HTR to the delivery and presentation of transcribed material online;
  • Work with digitised cultural archival resources, using HTR to generate transcriptions;
  • Apply ethnographic approaches to understand how HTR relates to traditional palaeographic practice;
  • Identify and work with the Library’s user communities and undertake user experience testing with them to evaluate barriers and opportunities.

This will result in a holistic, user focused analysis of the current provision of HTR, while also assisting The National Library of Scotland and other cultural heritage institutions in being able to understand how best to deploy this new technology in an effective manner, understanding implications for themselves, and their users.

The successful student is likely to have relevant experience and qualifications. This might include qualifications in Library and Information Studies, Computer Science, Human Computer Interaction, Digital Humanities or cognate disciplines. They are likely to have knowledge of the Library and Archival sector gained either through professional or academic engagement. Alternatively, an appropriately strong academic background in addition to professional experience of the library sector and/or software development for cultural heritage organisations could substitute for specific qualifications. The placement part of the PhD will be carefully tailored to complement the candidate’s existing skillset, and the National Library of Scotland will give the opportunity to understand both existing digitisation workflows, and to contribute to discussions of future embedded use of HTR.

The University of Innsbruck is the home of Transkribus, and is coordinating the READ COOP. There will be opportunities within this studentship to visit Innsbruck, particularly for the Transkribus annual user conference, and to liaise with the team about developments with the software, including spending time with and in regular contact with the delivery team to understand how the Transkribus infrastructure operates. This will be done with full knowledge and support of the PhD supervisors.

New Paper: an examination of the implicit and explicit selection criteria that shape digital archives of historical newspapers

We’ve recently had a paper accepted to Archival Science journal, on work which has emerged from the Oceanic Exchanges project, which has been looking into reuse of mass digitised newspapers archives, as part of our Digging Into Data funded activities. The question is: how has the selection of historical newspapers for digitisation affected the type of text mining based research we can undertake – and how are institutions making decisions about what should be digitised? I’ve provided a link to the author’s accepted text, which we can share under the licensing for this journal:

Tessa Hauswedell, Julianne Nyhan, Melodee Beals, Melissa Terras, and Emily Bell (Forthcoming 2020). Of global reach yet of situated contexts: an examination of the implicit and explicit selection criteria that shape digital archives of historical newspapers Accepted: Archival Science.


A large literature addresses the processes, circumstances and motivations that have given rise to archives. These questions are increasingly being asked of digital archives, too. Here, we examine the complex interplay of institutional, intellectual, economic, technical, practical and social factors that have shaped decisions about the inclusion and exclusion of digitised newspapers in and from online archives. We do so by undertaking and analysing a series of semi-structured interviews conducted with public and private providers of major newspaper digitisation programmes. Our findings contribute to emerging understandings of factors that are rarely foregrounded or highlighted yet fundamentally shape the depth and scope of digital cultural heritage archives and thus the questions that can be asked of them, now and in the future. Moreover, we draw attention to providers’ emphasis on meeting the needs of their end-users and how this is shaping the form and function of digital archives. The end user is not often emphasised in the wider literature on archival studies and we thus draw attention to the potential merit of this vector in future studies of digital archives.

Keywords: digitization; newspaper; selection rationale; cultural heritage; critical heritage

New paper – How Open is OpenGLAM? Identifying Barriers to Commercial and Non-Commercial Reuse of Digitised Art Images

I’m delighted to be a co-author on a new paper recently published in the Journal of Documentation: How Open is OpenGLAM: Identifying barriers to commercial and non-commercial reuse of digitised art images (PDF of accepted manuscript).

This results from Foteini Valeonti’s work on Useum.org, where she has built a “virtual museum that democratises art”, including (or at least, trying to include!) many openly licensed images of artworks, testing out the limits of open licensing for both commercial and non-commercial applications. Are they really that open? what barriers are in the way?

The full citation is:

Valeonti, F., Terras, M. and Hudson-Smith, A., 2019. How open is OpenGLAM? Identifying barriers to commercial and non-commercial reuse of digitised art images. Journal of Documentation. doi/10.1108.

The authors’ last uploaded version is available to download here. I’ll paste the abstract, below!



In recent years, OpenGLAM and the broader open license movement have been gaining momentum in the cultural heritage sector. The purpose of this paper is to examine OpenGLAM from the perspective of end users, identifying barriers for commercial and non-commercial reuse of openly licensed art images.


Following a review of the literature, the authors scope out how end users can discover institutions participating in OpenGLAM, and use case studies to examine the process they must follow to find, obtain and reuse openly licensed images from three art museums.


Academic literature has so far focussed on examining the risks and benefits of participation from an institutional perspective, with little done to assess OpenGLAM from the end users’ standpoint. The authors reveal that end users have to overcome a series of barriers to find, obtain and reuse open images. The three main barriers relate to image quality, image tracking and the difficulty of distinguishing open images from those that are bound by copyright.
This study focusses solely on the examination of art museums and galleries. Libraries, archives and also other types of OpenGLAM museums (e.g. archaeological) stretch beyond the scope of this paper.

Practical implications

The authors identify practical barriers of commercial and non-commercial reuse of open images, outlining areas of improvement for participant institutions.


The authors contribute to the understudied field of research examining OpenGLAM from the end users’ perspective, outlining recommendations for end users, as well as for museums and galleries.



New Book Chapter – On Virtual Auras: The Cultural Heritage Object in the Age of 3D Digital Reproduction

Still from the Shipping Gallery video, showing the figurehead from HMS North Star. From Hindmarch (2015).
Still from the Science Museum, London’s, Shipping Gallery Lidar scan video, showing the figurehead from HMS North Star. From Hindmarch (2015, p. 145) with acknowledgement to Scanlab.

We’re really pleased to see the release of a new book, The Routledge International Handbook of New Digital Practices in Galleries, Libraries, Archives, Museums and Heritage Sites, Edited by Hannah Lewi, Wally Smith, Dirk vom Lehn, Steven Cooke (2019). Which has a book chapter from me and my colleagues in it! Based on the PhD research of Dr John Hindmarch, which was supervised by myself and Prof Stuart Robson, this chapter asks if digital heritage 3D objects have their own aura…

Hindmarch, J., Terras, M., and Robson, S. (2019). On Virtual Auras: The Cultural Heritage Object in the Age of 3D Digital ReproductionIn: H. Lewi; W Smith; S Cooke; D vom Lehn (eds) (2019). The Routledge international Handbook of New Digital Practices in Galleries, Libraries, Archives, Museums and Heritage Sites. London: Routledge, pp. 243-256.

Making 3D models for public facing cultural heritage applications currently concentrates on creating digitised models that are as photo realistic as possible. The virtual model should have, if possible, the same informational content as its subject, in order to act as a ‘digital surrogate’. This is a reasonable approach, but due to the nature of the digitisation process and limitations of the technology, it is often very difficult, if not impossible.

However, museum objects themselves are not merely valued for their informational content; they serve purposes other than simply imparting information. In modern museums exhibits often appear as parts of a narrative, embedded within a wider context, and in addition, have physical properties that also retain information about their creation, ownership, use, and provenance. This ability for an object to tell a story is due to more than just the information it presents. Many cultural heritage objects have, to borrow an old term, aura: an affectual power to engender an emotional response in the viewer. Is it possible that a 3D digitised model can inherit some of this aura from the original object? Can a virtual object also have affectual power, and if so, fulfil the role of a museum object without necessarily being a ‘realistic’ representation?

In this chapter we will first examine the role of museums and museum exhibits, particularly as regards to their public-facing remits, and what part aura plays. We will then ask if digitised objects can also have aura, and how they might help to fulfil the museums’ roles. We will see in the case of the Science Museum’s Shipping Gallery scan, that a digitised resource can, potentially, exhibit affectual power, and that this ability depends as much on the presentation and context of the resource as the information contained within it.

Under the licensing for this book, we are allowed to host the author’s last version on our own websites, so you can download a PDF of the full chapter here. Tim Sherratt is also rounding up other author’s last versions here, for other contents of the book!