[Humanist] 30.206 events: the toolbox; big data analysis

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Tue Jul 26 07:42:21 CEST 2016

                 Humanist Discussion Group, Vol. 30, No. 206.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    "Smithies, James" <james.smithies at kcl.ac.uk>              (27)
        Subject: Big data analysis for the humanities and social sciences

  [2]   From:    "Seaward, Louise" <louise.seaward at ucl.ac.uk>              (14)
        Subject: Conference Announcement: What should be in your Digital

        Date: Mon, 25 Jul 2016 08:18:51 +0000
        From: "Smithies, James" <james.smithies at kcl.ac.uk>
        Subject: Big data analysis for the humanities and social sciences

Dear Willard,

King’s Digital Lab are holding a workshop on August 26th that might be of interest to Humanist readers. A limited number of tickets are available at https://www.eventbrite.co.uk/e/big-data-analysis-for-the-humanities-and-social-sciences-tickets-26708754604:

‘Big Data Analysis for the Humanities and Social Sciences’, August 26th, King’s College London.

This event is hosted by King's Digital Lab.

The workshop will be led by Raaz Sainudiin. Raaz completed a PhD in Statistics at Cornell University in 2005 and was a Research Fellow of the Royal Commission for the Exhibition of 1851 at the Statistics Department of Oxford University until 2007. He is currently a Senior Lecturer in the School of Mathematics and Statistics at University of Canterbury, Christchurch, NZ. His recent excursions into scalable data science is funded by databricks academic partners program. His CV can be found here.

This workshop will introduce elements of Scalable Data Science for humanities and social science researchers using Apache Spark over a Databricks shard. It will guide attendees through hands-on analysis of US State of the Union addresses, Wikipedia click-streams, live Tweets, and Old Bailey Online dataset.

What we'll do
The workshop will introduce the basics of the in-memory distributed computing framework Apache Spark, including basic map-reduce operations via Spark's resilient distributed datasets (RDDs) for a word-count of US State of the Union addresses (first 40 minutes), data exploration via no-sql queries using Spark's dataframes for Wiki click-streams (30-40 minutes), Spark-streaming for filtering and getting top hash-tags of live tweets (30-40 minutes) and finally the loading, xml-parsing and the beginnings of exploration of the Old Bailey Online dataset (40 minutes, including discussions). There will be a 20 minute break during the workshop.

Who should attend?
Researchers in the humanities and social sciences who would like an introduction to big data analysis, using industry-standard tools. The workshop will be technical, and best suited to people with a good grasp of programming. More advanced users will be able to extend themselves. Non-programmers interested in seeing 'under the hood' of data analysis, perhaps in order to collaborate more effectively with technical colleagues, are also welcome.

What you need to bring and do
Bring a laptop if you have one. Access to eduroam and The Cloud will be available. Ideally you will have signed up for a Databricks Community Edition account before the day so you can follow along.

Please get on the waiting list for Databricks Community Edition as soon as possible: https://databricks.com/try-databricks.

Friday, 26 August 2016 from 09:00 to 12:00 (BST) 

Virginia Woolf Building room 1.34 - 22 Kingsway, London, WC2B 6LE

Dr. James Smithies
Director | King’s Digital Lab
Virginia Woolf Building Room 2.50 | King's College London
DDI +44 (0) 207 848 7552 | MOB +44 7543 632076
james.smithies at kcl.ac.uk | jamessmithies.org | @jamessmithies

        Date: Mon, 25 Jul 2016 10:55:10 +0000
        From: "Seaward, Louise" <louise.seaward at ucl.ac.uk>
        Subject: Conference Announcement: What should be in your Digital Toolbox?

Conference Announcement: What should be in your Digital Toolbox?

The Linnean Society <https://www.linnean.org/> of London, in collaboration with the Transcribe Bentham  http://blogs.ucl.ac.uk/transcribe-bentham/  initiative at University College London (UCL), is hosting a one-day conference on 10 October 2016 to showcase how innovative technology is being applied to the humanities and natural sciences.  The "Digital Toolbox" conference will demonstrate how researchers, curators and enthusiasts can use digital tools to explore historical and scientific material in new ways.

An example is the EU-funded READ http://read.transkribus.eu/  project, which seeks to unlock complex handwritten material in archival collections, to automatically index digital images of text, and to teach computers how to transcribe handwritten text. Cutting-edge transcription technology developed as part of the READ project will be demonstrated and discussed.

The conference will be a platform to share ideas on the best means of exploiting complex research data and opening it up to a wider audience. We are delighted to welcome Melissa Terras, Professor of Digital Humanities at UCL as keynote speaker.

More details on the full programme will be available soon.

There will be a small registration fee of £15 for the event.  This will cover tea/coffee, lunch and a wine reception.  Please find the registration form here: https://www.linnean.org/meetings-and-events/events/what-should-be-in-your-digital-toolbox

Dr. Louise Seaward
Research Associate
Bentham Project, Faculty of Laws, University College London, Bidborough House, 38-50 Bidborough Street, London, WC1H 9BT

Email: louise.seaward at ucl.ac.uk<mailto:louise.seaward at ucl.ac.uk>
Tel: 020 3108 8397
Web: Transcribe Bentham http://blogs.ucl.ac.uk/transcribe-bentham/ ; Recognition and Enrichment of Archival Documents (READ<http://read.transkribus.eu/>)

More information about the Humanist mailing list