[Humanist] 29.180 new on WWW: Hansard corpus 1803-2005

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Tue Jul 21 01:14:25 CEST 2015

                 Humanist Discussion Group, Vol. 29, No. 180.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

        Date: Mon, 20 Jul 2015 18:14:51 +0000
        From: Mark Davies <Mark_Davies at byu.edu>
        Subject: 1.6 billion word Hansard Corpus (British Parliament), 1803-2005
        In-Reply-To: <20150719203416.218DC65D3 at digitalhumanities.org>

We are pleased to announce the release of the 1.6 billion word Hansard Corpus (http://www.hansard-corpus.org). The corpus is part of the SAMUELS project (http://www.glasgow.ac.uk/samuels/) and has been funded by the AHRC (UK).

The Hansard Corpus contains 1.6 billion words from 7.6 million speeches in the British Parliament from 1803-2005. The corpus is semantically tagged, which allows for powerful meaning-based searches. In addition, users can create "virtual corpora" by speaker, time period, House of Parliament, and party in power, and compare across these corpora.

As with all of the other BYU corpora (http://corpus.byu.edu), the corpus allows queries by part of speech, lemma, synonym, customized word lists, and by section of the corpus (e.g. which words or phrases appear in one time period much more than in another). In terms of visualization, it allows users to view frequency listings (matching words and phrases), chart displays (overall frequency by time period), collocates (including comparisons between collocates of contrasting node words), and re-sortable concordance lines.

The end result is a corpus that will be of value not only to linguists (as the largest structured corpus of historical British English from the 1800s-1900s), but hopefully to historians, political scientists, and others as well.



Mark Davies

Mark Davies
Professor of Linguistics / Brigham Young University

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **

More information about the Humanist mailing list