Digitisation’s Most Wanted

15 May 201425 Jun 2015 ~ Melissa Terras ~ 10 Comments

What are the most commonly accessed digitised items from heritage organisations? Even asking the question leads to further understanding about the current digitisation landscape.

Have you seen this Dog? Last spotted on the Flickr account of the National Library of Wales. *Dog with a Pipe in Its Mouth*, Taken by P. B. Abery, 1940s.

Last month, at a meeting at the National Library of Scotland, an interesting fact flew by me. The NLS has hundreds of thousands of digitised items online, so what do you think is the most popular, and most regularly accessed and/or downloaded? (it is difficult to make the distinction regarding accessed or downloaded on most sites.) Is it the original Robert Burns material? The last letter of Mary Queen of Scots? or any of the 86,000 maps held in this, one of the best map collections worldwide? No. It is “A grammar and dictionary of the Malay language : with a preliminary dissertation” by John Crawfurd, published in 1852. This is accessed by hundreds of people every month – mostly from Malaysia, partly because it is featured on many product pages providing definitions of malaysian words – demonstrating the surprising reach and potential in digitising items and then making them freely available online, reaching out to a worldwide audience far beyond the geographical local of the library itself. Wonderful.

This left me pondering… what are the other most downloaded items at major institutions in the UK? So I sent out some feelers, and here are the results, demonstrating both the hidden complexity of the question, and the relationship of digitised heritage content to the current online audience landscape.

At Cambridge University Library, the most accessed collection overall is the Newton Papers, which was the first major digitised collection launched by the Library in 2010, and promoted widely. Within that, there is one particular notebook (which Newton acquired while he was an undergraduate at Trinity College and used from about 1661 to 1665 for his lecture notes) which is the most popular, featuring heavily in the initial promotion of the collection, and also in an In Our Time special series hosted my Melvyn Bragg on Radio 4. But within that notebook there is one page that is accessed more than the others, with most of the traffic coming from Greece. Why? This page was picked up in the Greek press and pointed to on many websites, blogs, newspaper reports, and in social media as evidence that Newton knew Greek. The links that remain still direct thousands of users to view Newton’s jottings from his Greek lessons at the front of the book, showing the fascinating relationship between publicity, social media, linkage, and an item which reflects national pride, to a worldwide audience.

The most downloaded items at Cambridge also reflect the rapidly changing mentions of items on social media: in April 2014, an item downloaded/accessed more than 6000 times was the Breviary of Marie de Saint Pol, which went live this month. Why the sudden notice? On the 3rd of April, one of the Cambridge colleges with thousands of followers posted a link to it on Facebook followed by the Cambridge Digital Library Facebook and Twitter feed on the 4th of April. Retweeted a few times, these few postings led to the thousands of views of the document, demonstrating the growing importance of using social media to tell people about newly mounted digitised content.

Over at Trinity College Library, the most accessed item from their digital collection in general is the Book of Kells, which again was their first major digitised item, heavily promoted in the press, and attracting a level of viewing that is unique due to general tourism and cultural heritage interest. The second most accessed digitised item is the surprise: a book of Lute music by William Ballet, from the 17th Century. There is much discussion of this item, and links to it online, posted by online communities of lute players, and those who blog about lutes worldwide. Interest and demand in at item can therefore be encouraged if interested online communities hear about it, and share with their membership.

A similar tale about the importance of publicity and social media emerges from the British Museum. There are popular items about the Viking exhibition which are linked from their home page at the moment given the current exhibition, but since the 1st January 2014 til now, the most popular item accessed in the digital collection (no, wait, go on, guess…. Rosetta stone? Vindolanda Tablets? …) is the Landscape Alphabet by Joseph Hulmandell (no? me neither). These were discovered and shared on social media by type enthusiasts on twitter in mid February, and promoted by the cool-hunter the Laughing Squid who has almost half a million followers on twitter, which caused a sudden spike (I cant see the British Museum actually tweeting them out themselves on their timeline). However, the initial swell of tens of thousands of hits has since dwindled to nothing, showing the fickleness of attention that comes with the social media stream. In 2013, the most single viewed item at the British Museum was… (go on, guess!)… a lead sling bullet, viewed 42,156 times in total. Why? It was picked up on reddit, due to the sarcastic inscription “some ancient sling bullets excavated from the city of Athens, Greece were inscribed with the word “ΔΕΞΑΙ” (dexai), which translates to “catch!”” which generated a lot of online LOLs (“Halt gentlemen. Do not yet partake of the feast before us, for I must capture the image of it with instagram whereupon I shalt bequeath it to my herald upon Facebook for all to see.” here) and this encouraged – and still encourages – visitors to the British Museum website: some forms of posting on social media generate the long tail of usage more than others.

Things start to get more complicated when various digital asset management systems (DAMS) come into place – often institutions have more than one database of digitised content, from different suppliers, with different licensing restrictions and requirements, and so ascertaining the most viewed single item is not a simple question. Organisations also post and share content in various different places. The National Library of Wales are looking through their DAMS to see which items are the most accessed, but immediately know that the most popular item they hold that has been posted to Flickr (with no known copyright restrictions, contributed to Flickr Commons) is the photograph at the top of this post, Dog with a Pipe in its Mouth, from the P. B. Abery Collection. Again, this is an image which has been mentioned regularly on blogs, social media, and internet chats, as well as being a featured image on the 2013 anniversary of Flickr Commons: the fact that it has no copyright restrictions encourages its reuse – and therefore traffic towards its host institution’s site, if those users point back to it – online.

The libraries at Oxford University, including the Bodleian, have been digitising items for over twenty years, and so it is difficult to say what the most accessed or popular items are, due to the way the systems have been designed, implemented and integrated over the past two decades. Their most downloaded or accessed digitised book, scanned in collaboration with Google, is probably the “History of the Scott Monument, to which is prefixed a biographical sketch of Sir Walter Scott” by James Colston (published 1881) – a freely downloadable version is available from its library record (ignore the resellers offering printed versions generated from this for much cost on amazon and eBay!). As far as images are concerned, the most popular at Oxford are among those listed on Early Manuscripts at Oxford University, partly because many of them have been up continuously for twenty years (legacy data for the history of downloads of specific images are not available, indicating how difficult it is to access long term data about this. Server logs get very big very quickly and so are generally periodically discarded, and it is only recently that reporting facilities such as Google Analytics have allowed a quick and easy overview of the usage of websites). Currently popular digitisation projects at the University of Oxford Libraries are the Polonsky Foundation Digitization Project, and the recently launched digitized First Folio of Shakespeare’s works, but there isn’t sufficient data available from all the digital collections to be able to say one way or the other which is the one most popular project, never mind item. It was also pointed out, though, that you would probably struggle just as much (if not more so) to identify which has been the most requested book in the Bodleian’s collections!

This trend of databases complicating the question continues at the British Library, where their digitisation outputs and projects are made available via multiple platforms and viewers, some managed by the British Library, and others by commercial partners, with some content available for free, other content via subscription, or paying a fee per image. These are only some of the most popular different sites: https://imagesonline.bl.uk, http://www.bl.uk/treasures/treasuresinfull.html, http://www.bl.uk/manuscripts/, www.sounds.bl.uk, https://www.flickr.com/photos/britishlibrary/, http://www.britishnewspaperarchive.co.uk/, http://find.galegroup.com/bncn/, http://gdc.gale.com/products/17th-and-18th-century-burney-collection-newspapers/ and the BL module on http://www.biblioboard.com/libraries.html. In addition, there are BL digitisation partnerships with other content providers, for example http://idp.bl.uk/ and http://eap.bl.uk/. Finding out the most accessed digitised item from within this is tricky (but not impossible – they tell me they are looking into it). The fact that they cannot say immediately demonstrates the complexity of running many large databases of digitised content.

These results, from very different institutions, invite discussions on shallow versus deep engagement with digital collections. Some examples of commonly accessed material are what we would think of as part of the Canon of Digitised Content: Shakespeare, Newton, Medieval Manuscripts. Some examples of commonly accessed material here can be taken as little more than clickbait – LOL! History! – or free reference material – its a free Malaysian Dictionary! Bonus! – but is getting people through the virtual door to digitised collections in this way, and through these items, such a bad thing? Come for the Dog with the pipe in its mouth! stay for the genealogy, then the discussions on palaeographic method! One can also argue that some of the discussion surrounding these objects are exactly what we are trying to encourage – many of the hundreds of comments posted on the Reddit item about the British Museum sling shot bullet, although hilarious, show consideration of what it would mean to be human in the time of Ancient Greece, and relate their societal response to ours. Isn’t that the starting place (and in some cases, the ending place) of engagement with primary historical evidence?

Asking to see Digitisation’s most wanted opens up wider questions of public engagement, the impact of social networks on internet traffic to digitised collections (from highlights posted by the institution, to those identified and shared by others outside it, often quite unexpectedly), and the role of making images of primary historical sources open for others to discover, use and share. We also become aware of the complex and intertwined database systems which are in place in many large organisations undertaking digitisation and delivering digitised items to users, and the difficulties in reporting on individual items (be they physical or digital!) as a result. Digitisation’s most wanted is also a rapidly moving target, dependent on publicity, and changing interest and focus over time: social media can encourage large swings and changes in popular items very quickly. The act of posing this question has led to an interesting discussion on how we think about use of digitised content, and how we can build up evidence about usage. (I’d also like to thank the organisations listed above for responding to my query so promptly!)

Have you, or any organisation you work with, been affected by the discussion in this blog post? Do you have any evidence you can contribute to the investigation? Your help is needed to catch digitisation’s most wanted. Please do post your comments about your experiences below (comments are moderated so may take a few hours to appear), or email m dot terras at ucl.ac.uk for them to be integrated here. The internet is a place of busy traffic. Someone must have seen them…

Update 15/05/14: The British Library’s Endangered Archives’ most popular item is the St Helena Banns of Marriage, an item commonly pointed to on genealogy websites such as this and this.

Update 16/05/14:
-The National Library of Australia have a discussion of their 25 most viewed digitised newspapers, and why, here.
– The International Dunhuang Project at the British Library tell me that a redevelopment of their database and website is underway to improve reporting for them, their partners and users.
– Glasgow University Library Special Collections tell me that their most popular item is the Curious Case of Mary Toft, from 1726, who supposedly gave birth to a litter of rabbits. This was featured as a book of the month in 2009, but picked up by the social media site Mental Floss in January 2014, with that page being shared on facebook more than 4000 times, and garnering 30,000 hits in one day alone, and has since been posted on various other social media platforms, including Reddit. Glasgow also say that there is a difficulty in measuring access counts as the content is held on various different servers, and it can be difficult to interpret Google Analytics in this case. They also point out that, from their perspective, there is a lack of benchmarks to compare usage of their items to that of other special collections.
– The National Archives tell me they point to the popular items as part of their navigation and as a result, these “most popular items” remain the most popular, in a virtuous circle. A very popular item at the moment is the The Security Service: Personal (PF Series) Files KV2 which hosts the records of spies such as Mata Hari. These were embargoed until Thursday 10 April 2014, then launched with an accompanying press release, which garnered significant press coverage worldwide, driving traffic to the site. The only frequently accessed item which is not in these lists is the muster roll of HMS Victory for the Battle of Trafalgar, which is commonly referred to in military and naval history websites (although interestingly few people link through directly to the page where it can be downloaded from, so those who read about it must come to TNA’s website and search themselves).

Update 19/05/14
– The Estonian Folklore Archives at the Estonian Literary Museum tell me that their most popular item is a leaflet from 1937 on how to preserve sealskins, although I can see no other webpages pointing to this item (perhaps because my Estonian search skills are weak!).
– UCLA Digital Library tell me their most viewed item is a Lyrical Map of the Concept of Los Angeles, a 23-foot long hand-drawn and hand-lettered map of Los Angeles, using the words and images of dozens of L.A. authors, which was on display in a museum in 2011, and was featured widely on blogs both at the time of the exhibit and since, which points people to the digital version now the display is no longer live in the museum space. Another popular item is the complete set of the 1582 Corpus Juris Canonici, the “Body of Canon Law,” particularly the table of contents, which is commonly linked to from those interested in Canon Law, such as this, thus driving subject specialists to the site.
– The History of Computing in Learning and Education Virtual Museum tells me the most viewed items are the writing competition and Historic Newsletters from the People’s Computer Company.
– A Hack day carried out at the Zurich Hackathon 2014 looked at image analytics from the US National Archives and Record Administrations contributions to flickr commons, looking at 200 million hits in a 3 month period and identifying the most common images: a description of that hack is here, which also gives examples of the most commonly looked at images. “There is a spike on March 24. Further analysis shows that the biggest referral on that day is Dorothy Height. Turns out this lady was featured on a Google Doodle on that day.” Popular subjects (and referrer pages, generally from Wikipedia) were John F. Kennedy, World War II, Japanese American Internment, Vietnam War. A full list is available on the project page. This shows the importance of institutions linking their content from Wikipedia, and what can happen if you are featured by Google.
– There is also a useful tool in BaGLAMA which shows view counts for pages using Commons images in GLAM-related category trees.

Update 20/05/14
– The Bodleian also make the very good point that “With most browsers now defaulting to ‘do not track’ combined with the EU cookies legislation it is difficult to find any sort of data that one can ‘stand behind’ these days.”
– The Jüdischen Museums Berlin‘s most accessed items are the Sammeldatensatz: Orden, Ehrenzeichen und Embleme von Julius Fliess (1876-1955), but they say that most accesses come from searches for “jewish emblems”, and so there is a need to add emblem as synonym for symbol to thesaurus, to help users find what they are looking for. In this way, looking at search terms can help develop user paths through the system so they can find what they actually want.
– The University of Iowa Digital Libraries say that based on google analytics for the last year, the most popular item is a dada book, and the most popular collection is Iowa Maps, but the access numbers for different objects in the database themselves are hard to count, and they’ll get back to me on that. Based on recent web searches reported from the web master, a surprisingly high number of people find them via searches for Peter Rabbit: the digital book of which is linked through to their site from the Wikipedia page and various other websites featuring Peter Rabbit.
– The National Library of Wales tell me the most popular article on http://welshnewspapers.llgc.org.uk is a 1916 Cambria Daily Leader advert for ‘blouses’ and ‘hosiery’. To find out more about why may take some digging, though!
– Hamlet Depot and Museums tell me that their most popular items are genealogical records, including railroad employees lists, and seniority records, and also historic pictures.

Update 22/05/14
– The New Zealand Electronic Text Collection tell me that reference works are their most used, including A Grammar and Dictionary of the Samoan Language, with English and Samoan vocabulary (which is linked to from thousands of different sources about New Zealand culture, and discussions on translation), New Zealand in the First World War (which is linked to from various history and genealogy sites) and The Official History of New Zealand in the Second World War (which is also popularly linked to online, including in reminiscing personal postings from soldiers who served, talking about the war on social media).
– The University of Otago Library provided me with a very detailed overview of the issues they face (thanks!). They are in the process of developing a repository to manage all of their digital collections that they want to curate, and the pilot will be live by November, but for the moment, they have a variety of different sites on which you can see digitised material, showing again the complex relationship of databases and content which many institutions have. For example, they have OUR Heritage which is a window across some collections. Some records are pulled from OUR Heritage and displayed via Special Collections Online Exhibitions. There also is Hocken Collections who had their reader access collection digitised and made available online. They track this via Google Analytics, and also watching their own server stats: and these do not in any way match up. Google does not capture when someone goes directly to a file, so Analytics reports just a fraction of the over a million hits in the past year that they can track on their server. They digitise on request, and respond to community demand, and are trying to prioritise the digitisation process. From Google Analytics, the most heavily used collections are the History of the University and Botanical charts (which belong to the Department of Botany at Otago and some are still used in the Labs. They digitised these, provided a copy for their use and deposited the originals in Hocken Collections.) The most popular items are “Key plan to Mr G.B. Shaw’s picture of Dunedin in 1851” which is mentioned on various genealogical sites online: a Painting “Sangro, a rosary of olive trees, landscape of windswept manuka.” which appears linked from some other major federated collections online and a printed map of Rome “Mappa della campagna Romana del 1547” which is a commonly consulted map (there are various copies of it in libraries worldwide) so those searching online to see it must find the freely available copy here.

Inaugural Preparations

24 Apr 201414 Nov 2014 ~ Melissa Terras ~ Leave a comment

So, my inaugural lecture is coming up in a few weeks, and I’m starting to write it now, nervously… The event has already sold out, but will be streamed online live, and there is also another lecture theatre at UCL that it will be shown live in (The Terras Terrace?). Dr Rudolf Ammann, UCLDH’s designer at large, has kindly provided some visuals for me… here’s the promotional flyer.

I plan to write the lecture out long hand once it is done, and of course, you will be the first to know about it (after I’ve given it…)

Making it Free, Making it Open – Transcribe Bentham, publications, and unexpected benefits

27 Feb 201425 Jun 2015 ~ Melissa Terras ~ Leave a comment

A few years ago I made a commitment to Open Access – in an attempt to reach a wider audience for my academic work, and to tell people about research as it was happening (not three of four years later once it was locked behind a paywalled journal). I’m really pleased to have something new to talk about once again, and this time I can share it with you before it even comes out in print. Allied to this are a few spin offs from the project in question – Transcribe Bentham, which aims to make the work of the the philosopher and reformer Jeremy Bentham (1748 – 1832) available via a:

double award-winning collaborative transcription initiative, which is digitising and making available digital images of Bentham’s unpublished manuscripts through a platform known as the ‘Transcription Desk‘. There, you can access the material and—just as importantly—transcribe the material, to help the work of UCL’s Bentham Project, and further improve access to, and searchability of, this enormously important collection of historical and philosophical material. [Link]

First, the article: a pre-publication version which will be published in April in a special issue of the International Journal of Humanities and Arts Computing, from Edinburgh University Press. In it, Tim Causer and myself talk about crowdsourcing transcriptions of Bentham’s writings, the impact of Transcribe Bentham on the work of the Bentham Project, and the use of volunteers to help us with tasks traditionally associated with lone academic researchers. We give particular examples of new Bentham material transcribed by volunteers dealing with the subjects of political economy, animal welfare, and convict transportation and the history of early New South Wales, which has further clarified and widened our understanding of certain aspects of Bentham’s thought. You can go and get it here:

Causer, T. and Terras, M. M. (2014) “Crowdsourcing Bentham: beyond the traditional boundaries of academic history”. International Journal of Humanities and Arts Computing, 8 (1) (In press). Link to PDF version in UCL Repository.

I’m pleased it is up there quickly, and openly, and free for all to see. Its one of the aims of the Transcribe Bentham project, of which I am only a small cog, to make Bentham’s writings more well known, accessible, and searchable, over the long term. Allied to that is the ethos in involving a wider group of society in contributing to the project – this is about “co-creation” (as it gets called in Gallery, Library, Archive, and Museum (GLAM) circles) rather than academic broadcast. It would make no sense for us to take the product of something developed in online crowdsourcing, and lock it back in the academic ivory tower, given we asked for help to understand and find the material in the first place. We’re finding our way with how to credit transcribers along the way (some of them are named in the article above, and we did ask their permission to do so) and to carry out crowdsourcing in as ethical a way as possible (something which is also of concern to others figuring out crowdsourcing in GLAM as we go). All in all, open access here is part of the Transcribe Bentham product: make it free, make it open.

And future doors line up ahead of us to walk through. This week we hit over 7000 manuscripts transcribed via the Transcription Desk, and a few months ago we passed the 3 million words of transcribed material mark. So we now have a body of digital material with which to work, and make available, and to a certain extent play with. We’re pursuing various research aims here – from both a Digital Humanities side, and a Bentham studies side, and a Library side, and Publishing side. We’re working on making canonical versions of all images and transcribed texts available online. Students in UCL Centre for Publishing are (quite literally) cooking up plans from what has been found in the previously untranscribed Bentham material, unearthed via Transcribe Bentham. What else can we do with this material?

And other doors open. I’ve talked before about reuse of the code behind Transcribe Bentham – in use by the Public Record Office of Victoria, and parts of it (the Transcription Desk bar, since you ask) has since been used in the Letters of 1916 transcription project, too. We’re also in talks with other collections who are thinking of doing crowdsourcing, and who may use the Transcription Desk: watch this space. Again, this is part of the same trajectory: make it free, make it available.

And other doors open. The development of systems to read handwritten material (more advanced than Optical Character Recognition, which to date really only has success on printed, clean material) depends on having datasets of images of handwritten texts, plus checked validated transcripts of their content in a useful format, to train and test systems and algorithms. Transcribe Bentham is pleased to be part of the Transcriptorium project (as am I!), looking into Handwritten Text Recognition (HTR) technologies, and a set of 433 pages of Bentham’s manuscripts plus the crowdsourced transcriptions are this year making up the “ICFHR 2014 Handwritten Text Recognition on the tranScriptorium Dataset” – to evaluate and test the current algorithms on Handwritten Text Recognition. How great is that. Did any of us sitting round the table first discussing crowdsourcing and Bentham back in 2009 ever expect we (and our transcribers) would be creating a benchmarked dataset in which to train handwriting recognition technologies? No. It is wonderful.

Create. Involve. Research. Make it available. Some of this by planning, some of this by happy accident. I now see the Open Access ethos underpinning all of this, and driving forward the direction of my research into the use of computing in culture, heritage, and the humanites. So, enjoy the article. We have access to and did and found out some cool stuff, you know – and we made it freely available.

Male, Mad and Muddleheaded: Academics in Children’s Picture Books

5 Feb 201425 Jun 2015 ~ Melissa Terras ~ 12 Comments

Academics in children’s picture books tend to be elderly, old men, who work in science, called Professor SomethingDumb. Why does this matter?

Like many academics, I love books. Like many book-loving parents, I’m keen to share that love with my young children. Two years ago, I chanced upon two different professors in children’s books, in quick succession. Wouldn’t it be a fun project, I thought, to see how academics, and universities, appear in children’s illustrated books? This would function both as an excuse to buy more books (we do live in a golden age of second hand books, cheaply delivered to your front door) and to explain to my kids – now five and a half, and twins of three – what Mummy Actually Does.

It turns out it’s hard to search just for children’s books, and picture books, in library catalogues, but I combed through various electronic library resources, as well as Amazon, eBay, LibraryThing, and Abe, to dig up source material. I began to obsessively search the bookshelves of kids books in friend’s houses, and doctors and dentist and hospital waiting rooms, whilst also keeping on the look out on our regular visits to our local library: often academics appear in books without being named in the title, so dont turn up easily via electronic searches. Parking my finds on a devoted Tumblr which was shared on social media, friends, family members, and total strangers tweeted, facebooked, and emailed me to suggest additions. People sidled up to me after invited guest lectures to whisper “I have a good professor for you…” Two years on, I’ve no doubt still not found all of the possible candidates, but new finds in my source material are becoming less frequent. 101 books (or individual books from a series*) and 108 academics, and a few specific mentions of university architecture and systems later, its time to look at what results from a survey of the representation of academics and academia in children’s picture books.

What are academics in children’s books like?

The 108 academics found consist of 76 Professors, 21 Academic Doctors, 2 Students, 2 Lecturers, 1 Assistant Professor, 1 Child, 1 Astronomer, 1 Geographer, 1 Medical Doctor who undertakes research, 1 researcher, and 1 lab assistant. In general, the Academic Doctors tend to be crazy mad evil egotists (“It’s Dr Frankensteiner – the maddest mad scientist on mercury!”), whilst the Professors tend to be kindly, but baffled, obsessive egg-heads who dont quite function normally.

The academics are mostly (old, white) males. Out of the 108 found, only 9 are female: 90% of the identified academics are male, 8% are female, and 2% have no identifiable gender (there are therefore much fewer women in this cohort than in reality, where it is estimated that one third of senior research posts are occupied by women). They are also nearly all caucasian: only two of those identified are people of colour: one Professor, and one child who is so smart he is called The Prof: both are male: this is scarily close to the recent statistic that only 0.4% of the UK professoriat are black. 43% of those found in this corpus are are elderly men, 33% are middle aged (comprising of 27% male and 6% female, there are no elderly female professors, as they are all middle age or younger). The women are so lacking that the denoument of one whodunnit/ solve the mystery/ choose your own adventure book for slightly older children is that the professor they have been talking about was actually a woman, and you didn’t see that coming, did you? Ha!

The earliest published academic in a children’s book found was in 1922 (although its probable that the real craze for featuring baffled old men came after the success of Professor Branestawm, which was a major international bestseller, first published in 1933, and not out of print since). The first woman Professor found is the amazing Professor Puffendorf – billed as “the world’s greatest scientist” -, published in 1992, 70 years after the first male professor appears in a children’s book. 70 years (although it is frustrating that the book really isn’t about her, but what her jealous, male lab assistant gets up to in her lab when she goes off to a conference. More Puffendorf next time, please). There is also a more recent phenomenon of using a Professor as a framing device to suggest some gravitas to a book’s subject, but the professor themselves does not appear in any way within the text, so its impossible to say if they are male or female. Male Professors in children’s books have appeared much more frequently over the past ten years: women not so much.

What areas do these fictional academics work in? (There is an entirely different genre of children’s books covering the lives of real academics – but that’s for another obsessive compulsive mini research project). Here we identify the subject areas of the 108 academics:

Most of the identified academics work in science, engineering and technology subjects. 31% work in some area of generic “science”, 10% work in biology, a few in maths, paleontology, geography, and zoology, and lone academics in rocket science, veterinary science, astronomy, computing, medical research and oceanography. There is one prof who is a homeopath, and I wasnt sure whether to put them in STEM or Fiction, so I plumped for STEM as they seemed to be trying to see if homeopathy worked (I like to presume all the academics here have proper qualifications, but who knows if fictional characters can buy professorships online these days). Subjects classed as Fictional were serpentology, dragonology, and magic. Arts, Humanities and Social Science subjects identified are archaeology (6% of the total), and linguistics, psychology, arts and theatre. 27% of those with an academic title make no reference to what type of area they supposed to work in: they are generally just trying to take over the world. Just out of interest, the female academics identify their subject areas as serpentology, maths, paleontology, ecology, and three generic scientists (with two further unknown subjects), so its not as is the women are doing the “soft” subjects in children’s books, when they actually appear.

Not all of these academics featured are humans: 74% are human, 19% are animals, 4% are aliens, 2% are unknown, and 1% are vegetable. There are no discernible trends regarding animals that are chosen to represent wisdom – its not like they are all owls – with three mice, three dogs, two toads, a kingfisher, a gorilla, a woodpecker, a pig, a crow, an owl, a dumbo octopus, a mole, a bumble bee, a shark, a cockroach, and a wooden bird. If you spot any defining similarities there, let me know.

There are some other fun trends to note. 46% of those humans featured are bald (higher than the average percentage?) – no women are bald. 35% had very big, messy hair, and it seems to be that if you are in academia, you should be a bit disheveled, in general. 45% have white hair – but none of the women have white hair. 13% had ginger hair (higher than the average percentage?). 37% had moustaches, and 16% had beards (higher than the average percentage?) – but no women had facial hair. What they wore is also interesting:

Labcoats, suits (but not if you are female!) or safari suits (but not if you are female!) are the academic uniform du jour.

The names given to the academics are telling, with the majority being less than complimentary: Professor Dinglebat, Professor P. Brain, Professor Blabbermouth, Professor Bumblebrain, Professor Muddlehead, Professor Hogwash, Professor Bumble, Professor Dumkopf, Professor Nutter, and two different Professor Potts. There is the odd professor with a name that alludes to intelligence: Professor I.Q, Professor Inkling, Professor Wiseman, but those are in the minority.

What types of book are they featured in? 82% of the 101 books are fiction stories, and the theme of the stories tends to be “academic is out of touch with how the world works, with hilarious consequences” in the case of professors, or “is evil and wants to take over the world, but is thwarted by our plucky hero (never heroine)” in the case of doctors. 7% of the books are factual, using a fictional academic to explain how science or experiments work, and 1% are cookbooks. The remainder, 10%, are a curious genre I have called “tall tales” – where the fictional academic character is brought in to bring gravitas and explain something, but the explanations are either fictional or bordering on fiction. Its a curious blend of science and fiction: they are not traditional stories, but work in a way which subverts the traditional children’s science books, injecting fiction into the process (not very succesfully, in most cases).

What can we draw from this? If you are going to be a fictional human academic in a children’s book, you are most likely to be an elderly, old man, with big white hair, who wears a lab coat, has facial hair, works in science, and is called Professor SomethingDumb or Dr CrazyPants, featuring in a story about how you bumble around causing some type of chaos. Close your eyes and think of a Professor. Is this what you see? Or this? (One wonders how much well-circulated images of Einstein have perculated into the subconscious of writers and publishers to emerge as the obvious representation of an academic in children’s illustrated books).

Universities in Children’s Picture Books

What about the universities themselves? They dont feature as often as the academics associated with them – the focus of children’s books is seldom about such an institution that will have an effect so far in the future of the reader, although some characters plan well ahead in advance. Lectures, when depicted, are obviously very boring and impenetrable. University buildings are like castle schools for grown ups or the site of secret underground lairs or the best holiday park ever. There are a couple of sweet kids books from the USA that attempt to describe the university campus and rituals of specific actual colleges – Baylor University and Boston College. But in general, the children’s books revolve around the characters, rather than the fact they are in a university, per se.

Why is this relevant?

Obviously, this has been a bit of a fun project. Given the lengths gone to to gather this corpus of children’s books, it is unlikely that any individual child would happen across all of the books noted. It’s actually interesting to think how few children’s picture or illustrated books feature academics or academia (at time of writing, Amazon lists 1.3 million books in its children’s section, and 101* different books (or books series) were identified in this project). While no doubt there are other books out there not on the list, this has been a darn good crack at finding as many as possible, not only in the English Language. Professors and academic Doctors in children’s books are a useful device on occasion, but really are not terribly frequent in the scheme of things.

That said, the difference in gender, and how women and men are represented, and the underepresentation of those who are anything but white in children’s books about academia, is shocking, especially given that almost all scientific fields are still dominated by men, and women are frequently discriminated against and although 46% of all PhD graduates in the EU are female, only 1/3 of senior research posts are occupied by women. At a time when researchers are asking if available toys can influence later career choice, can the same be said about books? At a time when it is becoming the parents’ job to encourage girls into science and technology – and to educate all children about science and engineering careers – does the lack of anything but white, old men as academics in children’s books reinforce the impossibility of anyone other than those making a contribution? At a time when the leaky pipe of academia shows that women are leaving in droves at every level of the academic ladder, should we be worried that there are no female academics in children’s books above middle age? Laugh at this analysis if you will, but sociological analysis of other children’s books has shown that

there is a hidden language or code inscribed in children’s books, which teaches kids to view inequalities within the division of labor as a “natural” fact of life – that is, as a reflection of the inherent characteristics of the workers themselves. Young readers learn (without realizing it, of course) that some… are simply better equipped to hold manual or service jobs, while other[s]… ought to be professionals. Once this code is acquired by pre-school children… it becomes exceedingly difficult to unlearn. As adults, then, we are already predisposed to accept the hierarchical, caste-based system of labor that characterizes the… workplace. [link]

Another analysis of 6000 children’s books published between 1900 and 2000 suggests the gender disparity, and the lack of women characters, sends children a message that “women and girls occupy a less important role in society than men or boys”:

The messages conveyed through representation of males and females in books contribute to children’s ideas of what it means to be a boy, girl, man, or woman. The disparities we find point to the symbolic annihilation of women and girls, and particularly female animals, in 20th-century children’s literature, suggesting to children that these characters are less important than their male counterparts… The disproportionate numbers of males in central roles may encourage children to accept the invisibility of women and girls and to believe they are less important than men and boys, thereby reinforcing the gender system. [link]

As for the diversity issue – in general, children’s books have been shown to be stubbornly white, even though “children of all ethnicities and races need role models of all ethnicities and races. That breeds normalcy and acceptance, and it’s good for everybody. [link]” What we are seeing here in this corpus, then, is a microcosm of what is happening in children’s literature in general, although played out alongside an ongoing debate about the involvement of women and minorities in the academy. That doesn’t make it ok, mind.

There are wider nuances, though, that dont just involve headcounts of men and women, black and white. Children’s perceptions of scientists have been shown to be based on various stereotypes, and the stereotypes of academics presented and promulgated in these books is the product of writers and publishers who, taken together, quite clearly don’t think academics are much cop, which will perculate back to those who read the books, or have the books read to them. Academics are routinely shown as individuals obsessed with one topic who are either baffled and harmless and ineffectual, or malicious, vindictive and psychotic, and although these can be affectionate sketches (“bless! look at the clueless/psychopathic genius!”) academics routinely come across as out of touch wierdos – and what is that teaching kids about universities? In this age of proving academic “impact”, it might be not so bad for us to be able to show we were relevant to society? That there is more to academia than science? Or for the kids books I show my kids to have more positive and integrated representations of professors and academics? Perhaps this is not the role of kids books though, and I should just be telling my kids my own tales of academic derring-do.

I mean, who would spend two years gathering a corpus of kids lit for fun, and then count how many beards the people in the books had. Wierdo. Wierdos, the lot of them.

Top Children’s Picture Books Featuring Academia

Out of all of the books found in this project, there are some which have been read and read again by my boys, and some which got tossed aside as soon as they arrived. There is also one I adore, but the boys are not so interested in. If you wanted to read some children’s books which feature academics and universities, you could do worse than start with the following:

1. Dr. Dog, by Babette Cole, Red Fox, London, 1994. Dr Dog is a medical Doctor, but who also does research. It has one fantastic page where Dr Dog goes to conference in Brazil to give a talk about bone marrow, and that one page has explained where Mummy goes when she goes in the airplane, on many occasions. Very useful. For age 2+

2. Professor Puffendorf’s Secret Potions. Robin Tzannes, Korky Paul, Oxford University Press, 1992. The most read story in our house about a Professor. Prof P goes off to a conference and her lazy lab assistant wants to steal her secret potions for himself… (I would have preferred to see more about her, though). 2+

3. Mahalia Mouse Goes to College, by John Lithgow, illustrated by Igor Oleynikov, Simon and Schuster, 2007, New York. Mahalia is a brave little mouse who wants to go to Harvard and study maths, and succeeds. Uplifting. 3+

4. The Rooftop Rocket Party, by Roland Chambers, Anderson Press, 2002. Doctor Gass is a rocket scientist, who doesnt believe a little boy that the water coolers on top of the New York skyline are capable of going to the moon… Delightful.3+

5. Professor Astro Cat’s Frontiers of Space, by Dominic Walliman and Ben Newman. Flying Eye Books, 2013. This is a lovely, well illustrated, detailed and well written kids introduction to astronomy, which is explained by Professor Astro Cat. Nice paper too, bibliophiles. For age 5+.

6. Professor Wormbog in Search for the Zipperump-a-Zoo (Mercer Mayer Classic Collectible: Little Monsters), by Mercer Mayer, Golden Pr. 1976. Professor Wormbog is searching for the only thing he hasnt got in his zoology collection… perhaps they are right under his nose all along? 2+

7. Mungo and the Spiders from Space. By Timothy Knapman, illustrated by Adam Stower. 2007, Puffin, London. A rollicking space adventure about a little boy who gets an old book about an evil doctor… and steps into the book… 4+

8. Any of the Octonaut books, by Meomi. Now a popular tv programme, the Octonauts started off as a book series. Professor Inkling shows how he can work with others to, ya know, deliver impact in the field etc etc. The Meomi books (Harper Collins) are delightful, with lots of detail that demand rereading – start of with the Octonauts Explore the Great Big Ocean (but steer clear of the tv spin off books published by Simon and Schuster – they arent a patch on the illustrated books by Meomi). Much, much loved in our house. 2+

9. The Dr Xargle and Professor Xargle books (he gets promoted at some point, evidently). By Jeanne Willis and Tony Ross, different publishers. Xargle explains various things about human society, or science, to his university class of aliens, with hilarious consequences. 3+

10. Professor Twill’s Travels, written and illustrated by Bob Gumpertz. Ward Lock Limited, London, 1968. A sweet tale of Professor Twill, travelling the world to collect animals. The illustrations in this book are very much of the era – it’s just beautiful. A forgotten classic. 1+

And one just for the adults: Jack Dawe and The Professors, Bedtime Stories for Technically Inclined Little Ones, 1964, illustrated by Brian Green. (By “Uncle B”, no press listed). An Oxford Professor wrote down and vanity published the tales of academia he told his nieces and nephews. They are absolutely hilarious.

Happy reading. And if you find any more academics or universities that I dont know about in children’s picture books… do let me know!

*There were a few characters that appear in series of books, for example Professor Branestawm, Dr Xargle, and Professor Inkling in Octonauts. Only one book from each series was counted: if all the books from series were included, there would be over 140 books in total. Please note, none of the spin offs from the children’s film Monsters University were included in this analysis, as we’re dealing here with things that started as books, rather than spin offs, and it would take over the corpus, and, hmmm, that deserves an analysis of its very own… uh-oh…

I’m not going to edit your £10,000 pay-to-open-access-publish monograph series for you

27 Nov 20138 Jul 2015 ~ Melissa Terras ~ 17 Comments

Over the last three or four months, I’ve been talking with an academic publisher – one of the big names that most people have heard of – who approached me to talk about launching a series in Digital Humanities. Now, Digital Humanities is quite fashionable at the moment, with many presses launching books and series about digital arts, culture, humanities and heritage, but goodness knows there is a need for a series that would publish only academic monographs in the area, rather than text books like this and this. I’ve been enjoying talking through the issues of publication with the press in question, and I asked Bethany Nowviskie to join me as co-editor, hoping to work together and thinking about how we could do something that suits our academic neck of the woods: offering good digital as well as print content, and tackling the open access monograph issue in as brave a way as possible, committing to delivering a high quality print publication that would also be available in open access too.

Last week they emailed me with their new company policy on open access. They are fully committed to offering high quality open access versions of their high quality academic books. But to produce open access versions, authors would be required to pay £10,000 (with applicable taxes added on top) to cover the “number of costs” that are involved to “produce” these titles.

I believe – at a time where rumours are flying that the next Research Excellent Framework will require all submissions to be available in open access, including monographs (although, please see later update at the end of the post about this) – that placing a £10,000 cost-to-publish fee onto monographs is iniquitous and will exclude many, if not most, early career scholars in the humanities from publishing their books in open access, as well as excluding any academic who is not at a very rich institution who has the resources to meet this publisher’s ~~ransom~~ demand. (There’s an excellent blog post by Mercedes Bunz which demonstrates this very point). This will have deleterious effects on humanities academic career progression, as the monograph is still seen to be the proof of academic excellence (even if “just print” will no longer “count”.) I believe that this stance by publishers to place the costs of publishing open access monographs onto humanities academics (in particular) is perfidious, and the only way we can counteract it is to stop engaging with presses who behave in this manner, refusing to submit manuscripts to them, but also, refusing to peer review manuscripts for them, and refusing to edit manuscripts – or a series of manuscripts – for them.

So I’m not going to edit their £10,000 pay-to-open-access-publish monograph series. And here is my reply to them. I’m not sure about the legalities of talking about this, so I have stripped out any identifying information regarding the individual publisher to safeguard myself. I would very much welcome your comments.

Dear (doesn’t matter which particular publisher, this could be directed to the whole shower of those who are asking for £10,000 for pay to open access publish a humanities monograph).

I understand that you are operating in a world where traditional publishing mechanisms and relationships have been turned on their heads. I understand that you have revenues to make to cover your costs, and profits to report to shareholders. I understand that, given a lot of authors from now on will have to provide open access versions of their research, you see this as an opportunity to further extend your profits. But I cannot understand the maths involved in calculating that it will cost £10,000 to turn a ready-for-print PDF proof into an ebook (seriously, I’ve been round the block a few times in book production, and that’s some hourly rate those folks are charging you). You have looked at the £2,000 per academic journal paper model for open access in the sciences, and simply multiplied it and stuck it onto what you think is an humanities equivalent: a monograph equals about five journal papers, right? It would be more honest for you to say: we are charging £10,000 to offset the open access copy against loss of potential revenue for book sales. I understand that this is a concern for you, of course I do, and it would be better to say this up front.

But even with this concern, I do not see that humanities authors are the people you should be targeting to make a profit.

The £10,000 cost for open access is not a commitment to open access at all. It is is a shield behind which you can keep open access away from those who might harm your profit margin. But think of the poor humanities academic who *has* to publish their work in open access. What are they going to do? Turn to their institution? Only the best ranked institutions in the world will be able to cover their costs: are you seriously saying that only those in the top universities worldwide are welcome to publish open access with you? Even within those institutions, only the top ranked individuals with prior grant income would have such a request entertained: here’s a secret which you probably haven’t figured out: most humanities faculties aren’t rolling in money. So should aspiring book writers get the £10k from grant income? But you are applying a model from the sciences that doesn’t apply to the humanities: in the UK the average Russell Group humanities academic brings around your cost for open access in grant income a year, and funding councils who have had their own incomes slashed cannot expect to prop up the publishing industry. Some have suggested that the £10,000 is seen as an “investment in self” where individuals would seriously pony up the £10,000 from their own meagre funds (read: credit cards), in the hope that they would recoup this through promotion, tenure, etc. Its a huge gamble to take, at a time when many – including most early career scholars – are exhausted from carrying the student debt albatross round their necks. As a result, the numbers publishing open access with you will be few and far between. With your “commitment” to open access, you will still be able to publish print editions for those who do not care about securing an open access copy. There’s your open access commitment right there – you are more likely to never, ever have to publish an open access volume, even though you have a “policy”, as it is just not achievable for all but the independently wealthy. And academic success for all just moves that step further away again. Hurrah for building the pristine ring-fenced arena that no-one can ever use, unless they bring their own polo horse! *snort*. It’s just odious.

I know that my list of suggestions for pursuing an open access monograph series in Digital Humanities were not usual (just to recap, I asked for: the print book for sale, with full contents available for free in an open access digital version, with a creative commons license to be agreed with each individual author (some of them might allow commercial reuse, such as CC-BY, some of them might be more conservative going for ND). This would be Diamond Open Access -so full peer review process, item available free in digital form, but no “author pays” model, and the resulting book should be published in various ebook formats, with no digital rights management (DRM). The author should retain copyright. Ideas for offsetting costs and potential lost revenue include lowering the level of royalty payments, or increasing the point at which the publication will start to recoup costs, depending on a realistic cost model, which we could help work out.) I’ll also point out that I have never once asked for payment in any of this (and just for math’s sake: what proportion of that £10,000 per open access book will go to the series editors? Oh that’s right, none). So you expect to use my contacts, and to use my time, and for me to help feed into a exclusionary model that keeps your wheels turning, that takes money from institutions, or grants, or individuals, and to do that for you without even listening to anything I have been saying about the need for open access in the humanities, particularly within our community, or what we can do to fix – or at least experiment with – the existing model to be in everyone’s favour?

The open access agenda is a huge issue in Digital Humanities. It is at the heart of the discipline: doing things in the open, experimenting, being the voice for the humanities in the digital age, showing people how it is done. Digital Humanities is big business at the moment, as can be witnessed by the explosion of Digital Humanities titles published in the past year alone (which is why you are talking to me, after all). Goodness knows we need more research monographs to come out that give people the space to seriously consider and present their research ideas amongst all these textbooks. But this can only be done by operating within the research modes of the community. We could have committed to doing a trial of, say, 5 or 10 books that would be printed with diamond open access too, and being absolutely open and honest about the costs and the revenues and the potential losses and gains, and really led the way in a discussion about where open access monograph publishing goes, and what works, and what doesnt, and what the realistic costs of producing open access research to a high standard is. We would have been famous, we would have sold books, we would have attracted the best and brightest minds with the most brilliant texts, no matter what their bank balance was. As it is, your £10,000 (plus taxes) seems entirely one-size-only-fits-you, jumping on the bandwagon of a scared publishing industry whose fear is contagious, copying an approach which doesnt work for anyone, but allows you to have a policy that will never actually have to be exercised. I’m sad, as I see this as a missed opportunity for us to work together.

I am at a stage in my career where I do not have to take on anything that I do not want to do, or do not agree with. I am at a stage in my career where I should be sticking up for what I think is right, and also looking out for early careers scholars coming up behind me. I am uncomfortable in putting my name to a Digital Humanities series that touts a £10,000 pay to publish open access policy as fair or egalitarian. I’m not going to edit your £10,000 pay-to-open-access-publish monograph series. I doubt that any leading figure in our field would, but I wish you well in finding the person to take this book series forward.

I hope your book series in Digital Humanities is a success, I really do. Its been a pleasure scoping out what a book series could have looked like, especially with the challenges that face us in the digital environment. But I am left frustrated that we could have done so much together. Please do get in touch in the future, when this £10,000 open access model doesn’t work for you, when you may like to – or have to – be braver.

Update 28/11/2013: Since posting last night, this has gone a little… viral. With the result that, ring-ring! that’s HEFCE calling (via a tweet from Ben Johnson, thanks Ben) to point out the current state of affairs on the requirements for open access in the next REF. This is sketched out in paragraphs 46-50 of this policy document. So there won’t be a requirement in the next ref for open access, but their view is “that open access publication for monographs and books is likely to be achievable in the long term”.

Obviously, if I had been able to find this (rather than the rumours) this would have tempered a couple of sentences in my blog post above, but only a couple, so I’m not going to retool it. The fact remains that open access monographs are on the horizon, and that publishers are attempting to profiteer from this without any adequate costing model as to how to achieve them. I’m not happy about being any part of that, and will not give up my time, advice, and hard work to support a model which excludes many from taking part in making their work available via open access.

For Ada Lovelace Day – Father Busa’s Female Punch Card Operatives

15 Oct 201325 Jun 2015 ~ Melissa Terras ~ 5 Comments

15th October 2013 is Ada Lovelace Day – the annual celebration of women in science, technology, engineering and maths, named after Ada Lovelace, the first computer programmer. Working with Charles Babbage in 1840, Lovelace understood the significance of his Analytical Engine (a machine that can conduct a number of different functions, such as addition, subtraction, multiplication and division) and its implications for computational method. She saw that via the punched card input device the Analytical Engine opened up a whole new opportunity for designing machines that could manipulate symbols rather than just numbers. Lovelace attempted to draw together romanticism and rationality to create a ‘poetical science’ that allowed mathematics and computing to explore the world around us, recognizing the potential for a move away from pure calculation to computation, and possessing a vision that foretold how computing could be used in creative areas such as music and literature.

It seems apposite on Ada Lovelace day to look at some female punchcard operators from the very first days of available electronic computation, working on one of the first “poetical science” projects in “Humanities Computing”. From 1949, an Italian Jesuit priest called Father Roberto Busa (November 13, 1913 – August 9, 2011) pioneered the use of computing for linguistic and literary analysis, teaming up with IBM to produce an index of the works of St Thomas Aquinas. Thomas Aquinas wrote some 9 million words of medieval Latin, and so Busa’s project to index his works via computational methods took over 30 years, being one of the earliest and most ambitious projects in the field which is now called Digital Humanities.

To produce an index, the works of St Thomas Aquinas had to be encoded onto punchcards, and Marco Passarotti, from the CIRCSE Research Centre, Università Cattolica del Sacro Cuore, Milan, Italy (where the Index Thomisticus Treebank project is hosted), explains how this happened:

Once, I was told by father Busa that he was used to choose young women for punching cards on purpose, because they were more careful than men. Further, he chose women who did not know Latin, because the quality of their work was higher than that of those who knew it (the latter felt more secure while typing the texts of Thomas Aquinas and, so, less careful). These women were working on the Index Thomisticus, punching the texts on cards provided by IBM. Busa had created a kind of “school for punching cards” in Gallarate. That work experience gave these women a professionally transferable and documented skill attested to by Father Busa himself.

Update! (23/1013): We now know the name of the woman top left: Livia Canestraro. She also appears in many of the pictures below.

Livia Canestraro

Livia Canestraro, above and below.

Update! (23/1013): We now know the name of the woman back left: Rosetta Rossi Bertolli. Livia Canestraro is bottom right, and below.

Update! (23/1013): We now know the name of the woman second from the left: Gisa Crosta.

These previously unpublished images come from the archive of Father Busa and date from the late 1950s and early 1960s. Taken in Gallarate, Italy, they show the ranks of women involved in encoding and checking the punchcard content of Thomas Aquinas’ works. The women can also be seen demonstrating the technologies to visiting dignitaries, and overseeing the loading of the punchcards into the mainframe.

We don’t know the names of these women: further research and enquiries are ongoing to try to establish their identities, and their role in the project. However, it shouldn’t be that surprising to us that women were so important in Father Busa’s pioneering computing project: in the early 1960s computer programmers were commonly women. It’s pleasing to show on Ada Lovelace Day how important women were to one of the first projects in my academic field – look at the scale of the operation! – although further research is needed to uncover the role and responsibilities of women in this project: the majority of them seen here are doing data entry, albeit in a skilled and new format. The project certainly could not have happened without their input.

The images shown here are kindly made available under a Creative Commons CC-BY-NC license by permission of CIRCSE Research Centre, Università Cattolica del Sacro Cuore, Milan, Italy. For further information, or to request permission for reuse, please contact Marco Passarotti, on marco.passarotti AT unicatt.it, or by post: Largo Gemelli 1, 20123 Milan, Italy. This year is the 100th anniversary of the birth of Father Busa, which will be celebrated with a workshop in Sofia on the Annotation of Corpora for Research in the Humanities.

Digital Humanities in works of literature?

23 Jul 2013 ~ Melissa Terras ~ 4 Comments

This post finds me jetlagged and happily worn out after my trip over the pond to the Social, Digital, Scholarly Editing conference in Saskatoon, followed in quick succession by Digital Humanities 2013 in Lincoln, Nebraska. 10 days, 6 flights, 2 countries, 2 conferences, 2 papers, 1 panel session, 2 chaired meetings and 3 posters later, I made my way home yesterday and decided not to work on the plane home (shock! horror!) but to treat myself to a nice novel. I picked up “Her Fearful Symmetry” by Audrey Niffenegger, and happily battered through it whilst airbourne – laughing to myself when the following paragraphs emerged…

Martin shook his head… “I used to work at the British Museum, translating ancient and classical languages. But now I work from home”.

Julia smiled. “So they bring the Rosetta Stone and all that here to you?”…

“No, no. I don’t often need the actual objects. They take photographs and make drawings – I use those. It’s all become so much easier now everything is digital. I suppose someday they’ll just wave the objects over the computer and it will sing the translation in Gregorian chant. But in the meantime they still need somebody like me to work it out.” Martin paused, then said, rather shyly, “Do you like crossword puzzles?” (Niffenegger, A. (2009). Her Fearful Symmetry, p. 129. Scribner, New York.)

Later on in the book – set in and around Highgate Cemetry in London – the following is also said:

“Perhaps we ought to make another sign to post at the gate,” said James. “All uncertain grave owners please present yourselves during office hours when the staff can attend to your very time-consuming requests”.

“We want to help them,” said Jessica. “But they must call ahead. These people who pitch up on the cemetery’s doorstep wanting us to do a grave search while they wait – it’s beyond anything.”

“They think the records are digitised,” Robert said.

Jessica laughed. “Ten years from now, perhaps. Evelyn and Paul are typing in the burial records as fast as their fingers can fly, but with one hundred and sixty-nine thousand entries -“

“I know.”

Its not the first time I’ve seen digital humanities/ digitisation creep into fiction – I remember some ludicrous database in Dan Brown’s Da Vinci Code* – but it did make me think, people are starting to notice the kind of things we’ve been working on for (in my case) over a decade. It’s great to see something that so relates to my doctoral work and published texts pop up in a work of fiction. Heck, the people at US immigration who ask you what you do when you say you are going to a conference might even understand what “Digital Humanities” means next! Maybe not.

Anyone else stumble across mentions of computing, culture, humanities and heritage in fiction? If so, I might feel another Tumblr coming on. Uh-oh…

* I dont have a copy of the Da Vinci Code, but the internet has provided an illegal online version, I copy the scene here. First one to send me a cease and desist and ask me to take it down wins.

She glanced at her guests. “What is this? Some kind of Harvard scavenger hunt?” Langdon’s laugh sounded forced. “Yeah, something like that.” Gettum paused, feeling she was not getting the whole story. Nonetheless, she felt intrigued and found herself pondering the verse carefully. “According to this rhyme, a knight did something that incurred displeasure with God, and yet a Pope was kind enough to bury him in London.”

Langdon nodded. “Does it ring any bells?”

Gettum moved toward one of the workstations. “Not offhand, but let’s see what we can pull up in the database.”

Over the past two decades, King’s College Research Institute in Systematic Theology had used optical character recognition software in unison with linguistic translation devices to digitize and catalog an enormous collection of texts – encyclopedias of religion, religious biographies, sacred scriptures in dozens of languages, histories, Vatican letters, diaries of clerics, anything at all that qualified as writings on human spirituality. Because the massive collection was now in the form of bits and bytes rather than physical pages, the data was infinitely more accessible.

Settling into one of the workstations, Gettum eyed the slip of paper and began typing. “To begin, we’ll run a straight Boolean with a few obvious keywords and see what happens.”

“Thank you.”

Gettum typed in a few words:

LONDON, KNIGHT, POPE

As she clicked the SEARCH button, she could feel the hum of the massive mainframe downstairs scanning data at a rate of 500 MB/sec. “I’m asking the system to show us any documents whose complete text contains all three of these keywords. We’ll get more hits than we want, but it’s a good place to start.”

The screen was already showing the first of the hits now.

Painting the Pope. The Collected Portraits of Sir Joshua Reynolds. London University Press.

Gettum shook her head. “Obviously not what you’re looking for.” She scrolled to the next hit.

The London Writings of Alexander Pope by G. Wilson Knight.

Again she shook her head.

As the system churned on, the hits came up more quickly than usual. Dozens of texts appeared, many of them referencing the eighteenth-century British writer Alexander Pope, whose counter religious, mock-epic poetry apparently contained plenty of references to knights and London.

Gettum shot a quick glance to the numeric field at the bottom of the screen. This computer, by calculating the current number of hits and multiplying by the percentage of the database left to search, provided a rough guess of how much information would be found. This particular search looked like it was going to return an obscenely large amount of data.

Estimated number of total hits: 2, 692

“We need to refine the parameters further,” Gettum said, stopping the search. “Is this all the information you have regarding the tomb? There’s nothing else to go on?”

Langdon glanced at Sophie Neveu, looking uncertain.

This is no scavenger hunt, Gettum sensed. She had heard the whisperings of Robert Langdon’s experience in Rome last year. This American had been granted access to the most secure library on earth – the Vatican Secret Archives. She wondered what kinds of secrets Langdon might have learned inside and if his current desperate hunt for a mysterious London tomb might relate to information he had gained within the Vatican. Gettum had been a librarian long enough to know the most common reason people came to London to look for knights. The Grail.

Gettum smiled and adjusted her glasses. “You are friends with Leigh Teabing, you are in England, and you are looking for a knight.” She folded her hands. “I can only assume you are on a Grail quest.”

Langdon and Sophie exchanged startled looks.

Gettum laughed. “My friends, this library is a base camp for Grail seekers. Leigh Teabing among them. I wish I had a shilling for every time I’d run searches for the Rose, Mary Magdalene, Sangreal, Merovingian, Priory of Sion, et cetera, et cetera. Everyone loves a conspiracy.” She took off her glasses and eyed them. “I need more information.”

In the silence, Gettum sensed her guests’ desire for discretion was quickly being outweighed by their eagerness for a fast result.

“Here,” Sophie Neveu blurted. “This is everything we know.” Borrowing a pen from Langdon, she wrote two more lines on the slip of paper and handed it to Gettum.

You seek the orb that ought be on his tomb. It speaks of Rosy flesh and seeded womb.

Gettum gave an inward smile. The Grail indeed, she thought, noting the references to the Rose and her seeded womb. “I can help you,” she said, looking up from the slip of paper. “Might I ask where this verse came from? And why you are seeking an orb?”

“You might ask,” Langdon said, with a friendly smile,” but it’s a long story and we have very little time.”

“Sounds like a polite way of saying “mind your own business.””

“We would be forever in your debt, Pamela,” Langdon said, “if you could find out who this knight is and where he is buried.”

“Very well,” Gettum said, typing again. “I’ll play along. If this is a Grail-related issue, we should cross-reference against Grail keywords. I’ll add a proximity parameter and remove the title weighting. That will limit our hits only to those instances of textual keywords that occur near aGrail-related word.”

Search for: KNIGHT, LONDON, POPE, TOMB

Within 100 word proximity of: GRAIL, ROSE, SANGREAL, CHALICE

“How long will this take?” Sophie asked.

“A few hundred terabytes with multiple cross-referencing fields?” Gettum’s eyes glimmered as she clicked the SEARCH key. “A mere fifteen minutes.”

Langdon and Sophie said nothing, but Gettum sensed this sounded like an eternity to them.

“Tea?” Gettum asked, standing and walking toward the pot she had made earlier. “Leigh always loves my tea.”

On Changing the Rules of Digital Humanities from the Inside

27 May 201325 Jun 2015 ~ Melissa Terras ~ 7 Comments

There has been a lot of talk recently about how my field – Digital Humanities – has to change. We are too insular. We’re excluding those who want to partake in it. The structures that have been built within the discipline preclude the type and means of research which we claim to do. Issues of gender, race, ethnicity, and class raise their heads. There are a few online resources that exist which sum up these feelings: see “Toward an Open DigitalHumanities” google discussion document and, more recently, the Open Thread on “The Digital Humanities as a Historical“Refuge” from Race/Class/Gender/Sexuality/Disability?” over at Postcolonial Digital Humanities.

I’m not denying that there are issues in Digital Humanities. One need only look at the recently published program for DH2013 and cast your eye over the authorship of the accepted papers to see that this year’s Digital Humanities presenting cohort is around 65% male, 35% female. But what I would say, speaking on a personal level and not representing any authority here, is an obvious point which I don’t hear often voiced. Most people “within” Digital Humanities – that is those within the ADHO committee structures, those helping to run the conferences, those helping to allocate student bursaries and prizes, those helping to review papers and manuscripts, and heck, even the cool kids on twitter, are people who want Digital Humanities to be as open and as great as possible. This whole field has been built on the hard work of many academics who have given up their free time to try and entrench the use of computing in humanistic study into an academic field of enquiry, and it wouldn’t exist without them, even if the form it exists in is currently imperfect. I would say, from where I sit on various committees, that people want to keep DH growing, and growing healthily. So if there are things wrong with DH, then do give concrete examples, or propose concrete solutions, so they can be taken forward. They’re listening – we’re listening.

There are things that have really frustrated me within DH, and it is only recently that I’ve started to actively question and pursue them, to get them to be changed. For example, in 2006 I first noticed that the TEI guidelines encouraged the use of ISO5218:2004 to assign sexuality of persons in a document (with attributes being given as 1 for male, 2 for female, 9 for non-applicable, and 0 for unknown). I find this an outmoded and problematic representation of sexuality, which in particular formally assigns women to be secondary to men, and so, in one of the core guidelines in Digital Humanities, we allow and indeed encourage sexist structures to be encoded. I was shocked to hear this – and have often brought it up when discussing entrenched issues in DH about gender balance. In a recent conversation on twitter about this topic, Stephen Ramsay summed up the issue:

James Cummingsresponded to our tweets, asking why, if it bothered me (and others) so much, hadn’t anyone submitted a feature request to TEI about it? And you know, it had never occurred to me that there would be an easy route to question this sort of stuff. He pointed me to where to submit a request, which I did here. The discussion which follows is really very interesting – look out for the “you cant possibly be offended!” argument, or the “but we’ve always done it this way!” response. Also look out for very vocal support from Gabriel Bodard, in particular, who helped steer the discussion forward to ensure that at

“the TEI Council meeting in Brown, 2013-04, we agreed to change the datatype of person/@sex, personGrp/@sex and sex/@value from ISO 5218 to data.word, so as to allow the use of locally defined values or alternative published standards to be used in these attributes.”

Women are secondary in the TEI rules no more! Hurrah! – and all it needed for that to happen was for someone to raise the issue in the correct forum, and explain the issue to those who did not understand it, until they finally did.

I’m Program Chair for DH2014 and issues of diversity and equality are currently on my mind as we discuss and choose plenary speakers for the Lausanne conference. It was recently pointed out to me, though, that the ADHO conference protocols don’t allow issues of diversity to be taken into consideration when choosing plenary speakers, originally saying

“Keynote speakers are decided by the International Program Committee in consultation with the Local Organiser, and should ideally represent a range of disciplines, interests, and geography.”

This isn’t good enough, as it means that you cant say “We’ve got a man to be one of the speakers, how about having a woman for the other one?” without being at risk of being accused of breaching protocol. I’ve recently chased an amendment round the ADHO committee structures, which means the ADHO conference protocols, since last week, state:

“Keynote speakers are decided by the International Program Committee in consultation with the Local Organiser, and should ideally represent a range of complementary disciplines, interests, and geography, with consideration given to issues of gender equality, and economic, ethnic, cultural, and linguistic diversity.”

Perhaps a small deal, focussing on the choice of a couple of speakers a year at our international conference, but pointing to the fact that the ADHO constitution needs to be looked over, to see where we can enshrine issues of gender equality, and other issues of diversity, within our communities. We need to make the rules that people have to abide by. We can make the rules, and we can change the rules. What rules would there help to be?

Of course, changing rules and guidelines wont make everything change overnight, and I wouldnt like to naively claim they will solve everything, but they are a start. I guess what I’m saying here is that, in general, folks “within” Digital Humanities are doing their best, and open to discussion and improvement, and are not willfully obstructive to those of a different gender, race, or economic class, etc. Criticism is helpful, and if there are things that need changing, or unconscious biases that need rectifying, then point them out, tell us. Tell us where concrete things are that we can act upon. We all want Digital Humanities to be the best it possibly can be, and I, for one, don’t mind changing the rules from the inside, in the time that I remain there.

28/05/13 Addendum to the original post: for an ADHO led initiative on diversity see GO::DH. I’d also like to encourage anyone who is interested in discussing change to consider standing for election to one of the ADHO organisations – we always need volunteers who want to roll up their sleeves!

On Throwing Your Klout Around

26 May 201324 Jun 2015 ~ Melissa Terras ~ 1 Comment

I am @melissaterras. I have just shy of 4500 followers on twitter, a blog which garnered 100,000 readers last year, and a current Klout score of 64. I tend to take this kind of thing with a grain of salt: I hang out on social media because I enjoy it and it has also proved useful and beneficial to my career. I’m aware I’m not Justin Bieber and that my stats – while above average – are not particularly big shakes.But over the past few weeks a few things have happened which have made me think about digital identity, responsibility, and where academic use of social media crosses into the “real life” arena.

Case 1. I travel a lot with work, usually using Opodo to book tickets. A few weeks ago I found myself locked out of “My Opodo” and couldn’t access it to check itineraries, tickets, or print boarding passes, etc. I tried getting in touch with customer services, spending hours on the phone, emailing, tweeting and asking for help. Nothing. With an upcoming trip, and growing frustration (spending an hour on hold to Opodo is never in the plan of my day) I posted a few disgruntled tweets about their shocking customer service, which, retweeted by some followers, had the potential to reach over 10,000 users within a matter of minutes. My mobile rang. Opodo – a firm reknowned for not answering customer complaints in a timely fashion- had phoned me to help resolve the problem.

I’ve seen it reported that Klout scores andtwitter follower counts are now being paid attention by customer services, but while I can provide various concrete examples of why having a digital profile has helped my academic career, this is the first time I can point to something which has actually helped resolve an issue I have had with a commercial entity. I’m simultaneously aghast that it would take an above average twitter following to help you get on a departing flight, and relieved that it helped me to get an increasing pressing travel issue sorted out. What about those not-so-valued customers that didn’t manage to get the issue resolved in time?

Case 2 is where I now am aware that writing something online could cost a local business tens of thousands of pounds in business. I’m not happy with the project management company who looked after a build at our home, as the ceiling is now leaking, and they are ignoring any enquiries we are making to help have this sorted. It would be easy for me to name them here, linking to their website, and within a couple of days if you googled for them my blog post would appear above their own website in the rankings, due to the fact that my blog is tapped into more existing networks than theirs.

It would seem that, at the moment, the easiest tool at my disposal to use is my digital identity. Indeed, it is probably the only leverage I have to stop the growing discolouration of our new dining room ceiling. But that makes me uneasy, as I know how difficult it would be for them to claw back in a negative customer comment once it has been broadcast online, and we are happy in general with our build and are sure this is a minor issue to resolve. Should I be throwing my klout around, if it will negatively affect others in the long term?

I’m left thinking of the increasingly intertwined nature of customer service, digital presence, and moral responsibility. Whilst I was playing at this, this stuff got real.

What People Study When They Study Twitter

9 May 2013 ~ Melissa Terras ~ 1 Comment

So, keeping good to my Open Access promises – my latest co-authored paper to go up in preprint, which will be out in print sometime this year in the Journal of Documentation – hot off the presses! Just as it goes up in preprint behind a paywall on the journal pages! is a jointly authored paper with Shirley Williams, from the University of Reading, and Claire Warwick, from UCLDIS. And here it is:

Williams, S and Terras, M and Warwick, C (2013) “What people study when they study Twitter: Classifying Twitter related academic papers”. Journal of Documentation , 69 (3). Free PDF Download From UCL repository.

In this paper, we identify the 1161 academic papers that were published about Twitter between 2007 (when the first papers on Twitter appeared) and the close of 2011. We then analyse method, subject, and approach, to show what people are doing (or have been publishing!) on the use of Twitter in academic studies, providing a framework within which researchers studying the development and use of twitter as a source of data will be able to position their work. Oh, we also provide the list of the papers we found, so you can have a look-see yourself.

And the story behind this one? Shirley was introduced to Claire and myself by the late (and much missed) Prof. Mark Baker at Reading, when we undertook the Linksphere project. Now, I’ve written about Linksphere elsewhere – it was an ambitious project which really didnt take off due to a variety of factors – but the good things to come out of it were our RA, Claire Ross, and meeting Shirley. We published a paper on the use of twitter by academics at conferences when the Linksphere project was going. A year or so after the project finished, Shirley was granted a research sabbatical, and asked Claire and I if we would be interested in carrying on that work with her. Kicking around a few ideas, we wondered whether it would be possible to round up all the published work on Twitter – what are people using it for? And then to analyse it, to see if we can classify how people are using it, what the datasets are, what the methods are, and what the domains are. Wouldnt it be nice to have a bibliography on the use of twitter in research papers? And so away Shirley went, working with Claire and I, and building up this nice framework in which we can look at twitter based research.

The paper was accepted into the Journal of Documentation last summer, and this month went up in preprint at the Journal of Documentation website, and is now out in Open Access from UCL’s research repository, before it even hits the Library shelves. Which is how it should be, non?

Professor Melissa Terras MBE FReng

Adventures in Digital Cultural Heritage

Author: Melissa Terras