Tuesday 9 June 2015 – Writing a Big Data History of Music

Please join us for our next seminar.

Presenter:  Stephen Rose (RHUL)

Title:  Writing a Big Data History of Music

Date:  9 June 2015

Time:  5:15 PM (GMT)

Venue:  John S Cohen Room 203, 2nd floor, IHR, North block, Senate House or live online via the Digital History Seminar blog.

Live Stream

Abstract: This seminar introduces the project A Big Data History of Music, which aimed to unlock the musical-bibliographical data held by libraries in order to create new research opportunities. The project cleaned and enhanced aspects of the British Library catalogues of printed and manuscript music, which are now available as open data from www.bl.uk/bibliographic/download.html. Analyses and visualisations of these datasets exposed previously uncharted patterns in the history of music, for instance involving the rise and fall of music printing in 16th- and 17th-century Europe, or the rise of nationalist colourings in music of the late 18th and early 19th centuries. The detection of these long-term trends permits new ways of linking music history to wider histories of culture, economics, society and politics.

Seminars are normally streamed live online on this blog and on YouTube. To keep in touch, follow us on Twitter (@IHRDigHist) or at the hashtag #dhist.

Posted in Events | Leave a comment

Tuesday 26 May – Virtual Rome: a digital reconstruction of the ancient city

4

Presenters:  Matthew Nicholls (Reading)

Title:  Virtual Rome: a digital reconstruction of the ancient city

Date:  26 May 2015

Time:  5:15 PM (GMT)

Venue:  John S Cohen Room 203, 2nd floor, IHR, North block, Senate House or live online via the Digital History Seminar blog.

Live Stream

Due to a Fire Alarm part way through the seminar the live stream of this event has separated into two videos. These have now been merged and will be displayed here until the final edited version of the video is available in a few weeks time.

 

Abstract: 

Dr Matthew Nicholls of the Department of Classics at the University of Reading has made a detailed digital reconstruction of the city of Rome as it appeared c.AD315. In this talk he will introduce the model and discuss some of the tools and methodology involved in its creation, including questions about date, level of detail, and conjecture. He will then talk about the paedagogical uses of digital modelling and the digital Rome model’s potential as a research tool: current work includes investigation of illumination at specific times of day and year, and sightlines within the ancient city to, from, and between major monuments.
Rome 5

Profile:

Matthew Nicholls read Literae Humaniores at St John’s College, Oxford and was a Junior Research Fellow at the Queen’s College, before taking up a lectureship in Classics at Reading where his work includes running an MA in the City of Rome. His research includes the study of ancient books and libraries, including a newly-discovered text by the 2nd C AD medical writer Galen. He is also interested in the digital reconstruction of ancient buildings and places, initially for reaching and outreach work and increasingly for research. His work in this area won the 2014 Guardian/Higher Education Academy national Teaching Excellence award, and he currently holds a British Academy Rising Star Engagement Award for work on digital visualisation in the humanities. As part of this scheme he will be running an introductory workshop on software skills for digital visualisation and welcomes enquiries about participation.

 

Seminars are normally streamed live online on this blog and on YouTube. To keep in touch, follow us on Twitter (@IHRDigHist) or at the hashtag #dhist.

Posted in Events | Leave a comment

Tuesday 24 March – Text Mining the History of Medicine

Please join us for our next seminar.

Presenters:  Sophia Ananiadou (Manchester University)

Title:  Text Mining the History of Medicine

Date:  24 March 2015

Time:  5:15 PM (GMT)

Venue:  John S Cohen Room 203, 2nd floor, IHR, North block, Senate House or live online via the Digital History Seminar blog.

Live Stream

Slide Show 

Abstract: I will present the results of a collaborative and interdisciplinary project between the National Centre for Text Mining (NaCTeM) and the Centre for the History of Science, Technology and Medicine (CHSTM) at the University of Manchester, demonstrating the capabilities of innovative text mining tools to allow the automatic extraction of information from two historical archives: the British Medical Journal (BMJ) (1840 – present) and the London-area Medical Officer of Health (MOH) reports (1848-1972). NaCTeM’s text mining tools have enriched these historical archives with semantic metadata automatically by extracting terms, named entities and events.  The development of a semantic search system focused on the understanding of historical changes in lung diseases since 1840.

Seminars are normally streamed live online on this blog and on YouTube. To keep in touch, follow us on Twitter (@IHRDigHist) or at the hashtag #dhist.

Posted in Events | Leave a comment

Will historians of the future be able to study Twitter?

Over the last year or so, our seminar has become increasingly web-focussed. Last week we had an excellent paper from Jack Grieve of Aston University on the tracking of newly emerging words as they appeared in large corpora of tweets from the UK and the US. By amassing very large tweet datasets, he and his colleagues are able to observe the early traces of newly emerging words, and also (when those tweets were submitted from devices which attach geo-references) to see where those new words first appear, and how they spread. Jack and his colleagues are finding that words quite often emerge first (in the US) in the east and south-east (or California) and then spread towards the centre of the continent. They don’t necessarily spread in even waves across space, or even spring between urban centres and then to rural areas (as would have been my uneducated guess). Read more at the project site, treets.net, or watch the paper.

This kind of approach is quite impossible without the kind of very large-scale natural language data such as social media afford. This is particularly so as most words are (perhaps counter-intuitively) rather rare. In the corpus in question, the majority of the 67,000 most common words appear only once in 25 million words. Given this, datasets of billions of tweets are the minimum size necessary to be able to see the patterns.

It was interesting to me as a convenor to see the rather different spread of people who came to this paper, as opposed to the more usual digital history work the seminar showcases. Jack focussed on tweets posted since 2013; a time span that even the most contemporary historian would struggle to call their own; and so not so many of them came along – but we had perhaps our first mathematician instead. This was a shame, as Jack’s paper was a fascinating glimpse into the way that historical linguistics, and indeed other types of historical enquiry, might look in a couple of decades’ time.

But there is a caveat to this, which was beyond the scope of Jack’s paper, to do with the means by which this data will be accessible to scholars of 2014 working in (say) 2044. Jack and his colleagues work directly from the so-called Twitter “firehose”; they harvest every tweet coming from the Twitter API, and (on their own hardware) process each tweet and discard those that are not geo-coded to within the study area. This kind of work involves considerable local computing firepower, and (more importantly) is concerned with the now. It creates data in real time to answer questions of the very recent past.

Researchers working in 2044 and interested in 2014 may well be able to re-use this particular bespoke dataset (assuming it is preserved – a different matter of research data management, for another post sometime). However, they may equally well want to ask completely different questions, and so need data prepared in a quite different way. Right now, the future of the vast ocean of past tweets is not certain; and so it is not clear whether the scholar of 2044 will be able to create their own bespoke subset of data from the archive. The Library of Congress, to be sure, are receiving an archive of data from Twitter; but the access arrangements for this data are not clear, and (at present) are zero. So, in the same way that historians need to take some ownership of the future of the archived web, we need to become much more concerned about the future of social media: the primary sources that our graduate students, and their graduate students in turn, will need to work with two generations down the line.

Certainly, historians have always been used to working around and across the gaps in the historical record; it’s part of the basic skillset, to deal with the fragmentary survival of the record. But there is right now a moment in which major strategic decisions are to be made about that survival, and historians need to make themselves heard.

This post was written by Peter Webster who can also be found on his own blog Webstory.

Posted in Postscript | Tagged , , , , | Leave a comment

Tuesday 10 March – Lost Visions: retrieving the visual element of printed books

The IHR Seminar in Digital History would like to welcome you to its second seminar of 2015.

Presenters:  Julia Thomas, Nicky Lloyd and Ian Harvey (Cardiff)

Title:  Lost Visions: retrieving the visual element of printed books

Date:  10 March 2015

Time:  5:15 PM (GMT)

Venue:  John S Cohen Room 203, 2nd floor, IHR, North block, Senate House or live online via the Digital History Seminar blog.

 

Live Stream

The live stream for this session did not work properly. Please check back for the edited version of the video in the postscript section of the blog. Thank you.

 

Slide Show

Abstract: Despite the mass digitization of books, illustrations have remained more or less invisible. As an aesthetic form, illustration is conventionally positioned at the bottom of a hierarchy that places painting and sculpture at the top. The hybridity or bimediality of illustration is also problematic, the genre having fallen between the cracks of literary studies and art history. In a digital context, illustration has fared no better: new technologies can aid the editing of a literary text far more successfully than they can deal with the images that accompany it.

This paper focuses on the challenges and the implications of an AHRC-funded Big Data project that will make searchable online over a million book illustrations from the British Library’s collections. The images span the late eighteenth to the early twentieth century, cover a variety of reproductive techniques (including etching, wood engraving, lithography and photography), and are taken from around 68,000 works of literature, history, geography and philosophy.

The paper identifies issues relating to the improvement of bibliographic metadata and the analysis of the iconographic features of the images, which impact on our understanding of ‘the image’ in Digital Humanities and the negotiation of Big Data more generally. The work undertaken as part of the Lost Visions project allows for the further development of Illustration Studies, repositioning visual culture in the largely text-based process of digitisation and problematising modes of textual production.

Seminars are normally streamed live online on this blog and on YouTube. To keep in touch, follow us on Twitter (@IHRDigHist) or at the hashtag #dhist.

 

Posted in Events | Leave a comment

Tuesday 24 February – Tracking the Emergence of New Words across Time and Space

The IHR Seminar in Digital History would like to welcome you to its first seminar of 2015.

Presenters:  Jack Grieve (Aston)

Title:  Tracking the Emergence of New Words across Time and Space

Date:  24 February 2015

Time:  5:15 PM (GMT)

Venue:  John S Cohen Room 203, 2nd floor, IHR, North block, Senate House or live online via the Digital History Seminar blog.

Live Stream

Download Slide Show here

Abstract: Very little is known about how new words spread in language. New words are regularly identified by lexicographers, linguists, and the news media, but until recently we have not had access to sufficiently large geo-coded and time-stamped datasets that would allow for the detailed analysis of the geographical diffusion of lexical items in real time. However, with the rise of social media and smart phones, it is now possible to compile very large corpora that meet these requirements, allowing for new words to be identified and mapped across time and space and for the first time. In this presentation, I identify numerous newly emerging words based on a multi-billion word corpus of American tweets from 2013-2014 and map their geographical spread across the United States.

Seminars are normally streamed live online on this blog and on YouTube. To keep in touch, follow us on Twitter (@IHRDigHist) or at the hashtag #dhist.

Posted in Events | Leave a comment

Citizen history and its discontents: Postscript

By Matt Phillpott

There are an increasing number of crowdsourcing projects making claims about being ‘citizen history’. Old Weather, one of the more successful crowdsourcing projects of recent years, has started to use the term, and Zooniverse (the company behind it) has taken the same infrastructure this year for a World War One project called Operation War Diary. Then there is the project, Children of the Lodz Ghetto, in which volunteers undertake actual research tasks, helping to track down the names and lives of school children who fell victim to the Holocaust. By its nature this research is often complex, as names vary and change, and sources come in a variety of languages.

Citizen history is the current ‘buzz-word’, and its use is a claim to be moving beyond crowdsourcing and offering as well an opportunity to learn and master the skills collaboratively and co-operatively, of an historian.

In this third talk of this year’s Digital History seminar, Mia Ridge from the Open University shared her research into crowdsourcing and citizen history projects and asked whether they are really helping people to become historians or if they are, in actuality, overstating their contribution. As Mia, herself put it, ‘can citizen history projects succeed without communities of experts and peers to nurture sparks of historical curiosity and support novice historians in learning the skills of the discipline?’

The role of the ‘expert’?

Mia was very careful to stress that the importance of ‘expert’ historians being involved at the beginning, and throughout the project, is not to suggest that the grassroots community that these projects hope to build cannot, and do not, manage to deal with complex historical data and interpretation on their own.

When citizen history projects work well, the forums, wikis and other online spaces become an active hive of activity and co-operative discussion and collaborative learning and training. However, these communities are built upon learning about sources and their interpretation in a collaborative environment, and there are times when professional historians can offer advice where the sources are difficult or no other answer is forthcoming, or to pick up and highlight on details uncovered that are of wider historical significance. Generally, people who take to citizen history projects are there to discover the past, and learn how to use the sources, and the input of professional historians are valued as part of that process.

Often however, the role of the professional or ‘expert’ historian, is largely hidden away. Mia noted that often professional historians take an active role in the forums near the beginning of a project to help to get things started, but later on, whilst they continue to check the forums, their input reduces as teaching, research, and funding applications, by necessity, take precedence. Ideally this shouldn’t happen, but there are very real obstacles that limit the time and effort professional historians can give to citizen history projects. How we overcome this difficulty is not an easy question to answer.

What makes citizen history a success?

For a citizen history project to become successful not just in developing a resource of research materials through crowdsourcing, but also in enabling the development of historians, it is essential to build a critical mass of discussion and usage, and to expose people to historical materials that are potentially interesting. It is, also, important to include expert input, as this can transform the process.

Essentially some citizen history projects are really crowdsourcing and are perhaps misusing the term, whilst others fail to reach their goals for one reason or another. Others are highly successful. Yet there is a risk in these projects that citizen historians will become seen as faux historians, with limited skills and abilities, where in reality there are a variety of levels of citizen historians ranging from those just beginning the process to those who have built up the skills and knowledge required of any other historian.

Mia ended her talk with a call for crowdsourcing and citizen history project organisers to be more careful with the terminology they use. Signing up to a project and doing a bit of transcription work does not make that person a historian, but this can become the end result. Projects need to be clear about what it is they are offering and asking, and what exactly is required to become a citizen historian rather than, perhaps, a citizen transcriber.

Posted in Postscript | 1 Comment

Digital Humanities Project, ‘Mapping Eighteenth-Century Tourism in the English Lakes’

On Wednesday 26 November 2014, the Digital History seminar is co-hosting a seminar with the British History in the Long-Eighteenth Century seminar. Here are the details:

Title: Mapping Eighteenth-Century Tourism in the English Lakes

Speakers: Ian Gregory and Chris Donaldson (Lancaster)

Location: Wolfson Room NB01, Basement IHR, North Block, Senate House

Time: Wednesday 26 November 2014, 5.15pm

Posted in Events | Leave a comment

Tuesday 18 November – Citizen History and its discontents

The IHR Seminar in Digital History would like to welcome you to its third seminar of the 2014 autumn term.

Presenters:  Mia Ridge (Open University)

Title:  Citizen History and its discontents

Date:  18 November, 2014

Time:  5:15 PM (GMT)

Venue:  John S Cohen Room 203, 2nd floor, IHR, North block, Senate House or live online via the Digital History Seminar blog.

Live Stream

Slide Show

Abstract: An increasing number of crowdsourcing projects are making claims about ‘citizen history’ – but are they really helping people become historians, or are they overstating their contribution? Can citizen history projects succeed without communities of experts and peers to nurture sparks of historical curiosity and support novice historians in learning the skills of the discipline? Through a series of case studies this paper offers a critical examination of claims around citizen history.

Seminars are normally streamed live online on this blog and on YouTube. To keep in touch, follow us on Twitter (@IHRDigHist) or at the hashtag #dhist.

Posted in Events | Leave a comment

Interrogating the Archived UK Web – postscript

By Adam Crymble

The second talk of our 2014 Autumn programme took on the challenge of a new type of source for historians: the Internet. Not online sources and databases, but the Internet itself. The first archived copies of the UK web have started to find their way into scholarly hands. Historians now have the ability to look at webpages as sources in themselves, just as we have previously read manuscripts as a window into the past. The web is a corpus rich in details about what we were like and what we thought was important, not that long ago. For a cultural or social historian, it’s a dream.

Peter Webster introduced the UK Web Archive, which is hosted by the British Library, and contains snapshots of the UK-web (.uk sites) dating back to the 1990s. A team of historians have been given access, to see what they can make of this new (and huge) resource. I want to emphasise the experimental aspect of this project, because in many respects I think we learned more about what these scholars couldn’t achieve than what they did achieve.

 

That’s not a failing in the quality of the scholars themselves. They managed to do exactly what we could hope from them: to test the limits of the historian’s method on a large, messy, digital archive. They’ve done us a great service in finding some of those limits. The question now ahead of us is what we’re going to do about it?

 

Two of the scholars were on hand to share their experiences. Gareth Millward, whose project explored hyperlinking behaviour towards the website of the Royal National Institute of the Blind (RNIB) in those early days of the web, and tried to uncover why people were casting those hyperlinks.

Also Richard Deswarte, who used the archive to explore manifestations of Europhobia online, looking particularly for indicators that people in Britain were using the web to express dissatisfaction with the country’s continued role in the EU.

 

The projects themselves took on interesting questions, which were appropriate, given the type of source. Most interesting for me – and a significant part of both presentations – was the discussion of where they had problems using the corpus. Both scholars complained of noise that made it difficult to identify unique or meaningful mentions. In Millward’s case the noise came in the form of an advertisement in the Guardian for a talking watch that was endorsed by the RNIB. The ad appeared on hundreds of pages, though it really only represented a single match for Millward’s purposes. Deswarte too had trouble with a rotating banner on a newspaper website that dramatically overemphasized the number of meaningful links to an article about Europhobia.

Both also noted the sheer number of hits they were getting, and Millward in particular emphasized his attempts to get the list down to a size where he could conduct a close reading. He had failed to do so, and is still left with a collection of 39,000 hits. However, both he and Deswarte reflected on that failure, and evoked the language of social scientists and their ideas about representative sampling that they felt would have been appropriate if given the opportunity to tackle this challenge again. That reflection is significant, because it shows both Millward and Deswarte recognized the limits of the historian’s skillset for a project such as this.

However, I think we can push those limits further. The very notion that we would do a close reading of the Internet is one that I think only historians would suggest. It shows how deeply the value of close reading is held in the profession, even if it proves entirely inappropriate. We need to move on from that belief: that you can only know something if you’ve read it carefully. If we hold on to this mentality we’re going to lose our chance to discover anything at scale. We’ll be unable to pursue the longue durée that Guldi advocated for in our previous seminar.

Sitting in the audience I couldn’t help but think that the solution wasn’t in sampling and close reading. It was in corpus linguistics, data manipulation, clustering algorithms, and distant reading. Skills that are so rarely taught in our history programmes, but that this experiment made clear need to become part of our disciplinary tool kit. And if not our toolkit, then we need to engrain the value of collaboration. If you can’t do it, find someone who can that wants to work with you.

The day of the lone scholar intent on close reading are numbered. The UK Web archive has showed us that. So what are we going to do about it?

Adam Crymble is a convenor of the Digital History seminar at the IHR and a lecturer of digital history at the University of Hertfordshire. The UK Web Archive is available to search now. In addition there are a variety of related research projects such as the Big UK Domain Data for the Arts and Humanities (BUDDAH) Project. Analysis into the sustainability of the dataset can be found on the website for the Analytical Access to the Domain Dark Archive (AADDA), and examination of the potential value of the UK Web Domain dataset can be found on the Big Data: Demonstrating the Value of the UK Web Domain Dataset for Social Science Research website.

Posted in Postscript | Tagged , , , | Leave a comment