The academic community I’m most heavily involved in – Digital Humanities – are fairly invested in twitter. At all times of the day there are major figures, students, and newbies in the field on there, just hanging out, debating topics, forwarding links to events, job postings, interesting research and cool things they have stumbled upon. People have studied this – graphing and charting the discussions, especially around the DH conference, and heck, even I have co-authored a paper on the subject.
I’m currently working on a book/project called Defining Digital Humanities and I thought, wouldn’t it be fun to get all – and I mean all – the tweets that contain the hashtag #Digitalhumanities – what fun could be had charting the growth of the discipline, the geolocation of tweets, the networks that exist, the sentiments surrounding it – etc etc. Now, hindsight is a grand thing- I should have thought to start scraping these back in 2006 – but surely it must be possible to get access to this for research? So I asked.
The first approach was to Gnip – who have “full historical access to the twitter firehose available exclusively”.They were really very helpful, and we got into a conversation about my needs, their licensing, and – of course – costs. The upshot is that if you want a hashtag, you can get it for a price, with the text delivered in JSON format. I was quoted between $15,000 and $25,000 for the full historical set (depending on the exact volume of the data, they are now looking into it to give me the final figure – I and they dont yet know how many tweets there are containing this hashtag).
The second place I asked was Datasift – “the leading platform for building applications with insights derived from the most popular social networks and news sources”.They do have access to the historical twitter firehose, but they don’t do one off searches, and licensing will start at $3000 per month to get access to it (on a yearly contract). They will be launching a pay as you go service at some point, they tell me. By the way, you can get $10 worth of free credit for processing if you sign up and play around with some current searches: I set a set for #digitalhumanities and I had run out of credit within a few hours. (I find the user interface very obfuscating– I’m still wrangling with it to see what that data actually is!).
Now, these costs are very little compared to the costs to access the full firehose and lets face it – a free service like twitter has to make its money somewhere. These were not vexatious enquiries: I’d really like to do this study. But now I have to find $25k down the back of the sofa to get access to this data (and incidentally, if I do, I wont be allowed to quote it, only to show the stats that emerge from the analysis).$25k is a fair whack of money in academia-land. It will also take around 6 months (at least) to write it into a grant proposal to raise the money – and how to persuade academic funders that buying this dataset is good use of their money? Frankly, I’m not sure that will fly in the arts and humanities, where complete grant costings can come under £100k for a one year project.
Thinking caps are now on to see how we can get funding put together to get access to the data of the community I – goddamit – helped (in some small way) to create. I love twitter with a passion and it continues to inform and aid my teaching and research. But when we invest so much in a free service, we are selling ourselves. It’s interesting to see how much #digitalhumanities is “worth” to others. Anyone got a free $25k?
8 thoughts on “What Price a Hashtag? The cost of #digitalhumanities”
My Digging Into Data project was trying to do the exact same thing, recently, and came to the same monetary issues. Perhaps it would be worth it (possible?) for many of these projects to pool their money and share the data between them? In the meantime, our stopgap was to just begin collecting now and continue for the foreseeable future…
#DigitalHumanities Kickstarter? 🙂
Sounds like a perfect #digitalhumanities kickstarter. 🙂
This project seems to be really exciting, and I would like to read the results either in the book or here, but to be frank, I cannot donate the sum of money you need. But when reading your post it occurred to me that the Library of Congress archives tweets, and makes them available for research purposes though only on site. Having a friend there might come in handy, and would be a cost effective solution, wouldn't it?
Or pressure Twitter for an academic licensing scheme? They could manage an application procedure etc…
My first two thoughts are already suggested: LoC and crowdfunding, although I'm not sure about the second. I suspect, correct me if I'm wrong, that if you buy the data from this companies, they can tie you to a contract not allowing you to share the data. I can't contribute with much, but would give something if in the end the data could be open for everyone.
I would like to add that I can't agree when you say this is the cost of using a free service. If Twitter was a paid service, you wouldn't have a guaranty you could access the data. I believe the problem here is the fact Twitter is a closed, proprietary service. Actually, if Twitter was free software, you probably wouldn't have this problem.
As a side note, I tried to make this comment in the mobile version of your blog and it didn't let me, had to change to the desktop version. I'm using Firefox in Android.
This sounds like a great project, and I hope you can figure out how to deal with these financial issues. I've posted some other thoughts in response here.
What Tim said! Sounds like a great project.