So, when I announced the Bentham Transcription Initiative (which will soon have its own website, we are working on things behind the scenes) I said it was a “highly innovative and novel attempt to aid in the transcription of Bentham’s work”. I firmly believe that: I don’t know of any other large scale transcription attempt of correspondence that is opening things up to crowdsourcing, and our project has a broad remit, producing an open source tool, whilst undertaking user studies on the use of crowdsourcing in cultural heritage application.
But that is not to say that there are not other crowdsourcing projects out there (and I’m sorry if I implied that!). I have had very interesting exchanges with quite a few people, and so I thought I’d draw a few other projects to your attention, if you are interested in community based online cultural heritage projects (and beyond).
- There is huge amateur interest in genealogy, and the Free Births, Marriages and Death (FreeBMD) register have been transcribing the Civil Registration index of births, marriages and deaths for England and Wales, and to provide free Internet access to the transcribed records.
- “Small and Special” has been using volunteer effort to create a database relating to the early years of The Hospital for Sick Children at Great Ormond Street, including patient admission records and articles.
- The New Zealand Electronic Text Centre have an interest in transcription of cultural material, and they’ve been doing some very exploratory work in crowd-sourcing transcription.
- The National Library of Australia’s Australian Newspapers is using crowd sourcing to correct the OCR of digitised Australian newspapers and with some contributors correcting hundreds of thousands of lines of text.
- The USGS North American Bird Phenology program encouraged volunteers to submit bird sightings across North America from the 1880s through the 1970s. These cards are now being transcribed into a database for analysis of migratory pattern changes and what they imply about climate change.
Then there is the idea that building an online tool to help transcribing manuscripts is novel. There are a good few things out there, it turns out.
- Ben Brumfield was kind enough to point out his blog, Collaborative Manuscript Transcription, which has both links to projects and tools, as well as considering the types of things one has to keep in mind when designing an online tool for transcribing texts. Ben has also developed his own system, http://beta.fromthepage.com/, software that allows volunteers to transcribe handwritten documents online. We’ll be looking at it closely.
- The MediaWiki ProofreadPage plugin has been developed for many print transcription projects and a few manuscript projects. Current English-language projects using the plugin are listed there (and there are quite a few).
- The BYU Historic Journals Project has developed an online transcription tool. The server seems to be down for maintenance (http://journals.byu.edu/) but there is a video online which demonstrates how they have been using their online tool for both searching and creating information.
- The Worcester Polytechnic Institute Emergent Transcriptions/Transcription Assistant software system has also been pointed out, you can see more at E-Scripts@ WPI and Uscript.org.
- The New Zealand Electronic Text Centre have produced a tool called OpenScribe (Online Volunteer Transcription Service) which is based on a slightly-modified Drupal installation, the source-code for which is hosted in svn on Google Code. They have developed another tool, Remote Writer, which provides a web-based word-processor GUI for someone to easily markup text to xhtml which can then be translated to TEI using stylesheets, and is how they have enabled non-technical contributors to create the content found at sites such as Turbine literary journal and Best New Zealand Poems.
- The SCRATCH (SCRipt Analysis Tools for the Cultural Heritage) project is exploring methods for automated information retrieval and analysis in large collections of scanned handwritten-document images. That’s a slightly difference focus to the rest of the projects named here, but I include it as it may be of interest.
So that’s the round up so far. Richard Davis, the developer from ULCC who is working on the Bentham project with us, has also posted an overview of who he has been chatting to. Once we get the project name, domain name, and website sorted out, we’ll be posting lots of updates about our development of the tool – keeping the project as open as possible, in all kinds of ways.
If you know of any other cultural heritage projects using crowdsourcing, in particular for manuscript material, then please do get in touch. And if you hear about any other online manuscript environments we need to be aware of, drop us a line too! We wont be getting properly stuck into the Bentham Transcription project til RA’s are appointed (closing date for applications March 8th…. ) but it is good to learn what else is out there.
Update: I forgot to mention the “International Amateur Scanning League” which is a crowdsourcing digitisation project to digitise material from the USA’s National Archives and Records Administration. Its a different focus – digitisation rather then transcription – but what a great name! They have a badge, and everything!