[Humanist] 28.801 pubs: historical American English; participatory edn of Ulysses

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Tue Mar 10 09:59:55 CET 2015

                 Humanist Discussion Group, Vol. 28, No. 801.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    Mark Davies <Mark_Davies at byu.edu>                         (15)
        Subject: COHA: Downloadable full-text data (385 million words,
                115,000 texts)

  [2]   From:    Amanda Visconti <amandavisconti at gmail.com>                (37)
        Subject: Invitation for the Open Beta of the Infinite Ulysses
                Participatory Digital Edition

        Date: Mon, 9 Mar 2015 14:10:27 +0000
        From: Mark Davies <Mark_Davies at byu.edu>
        Subject: COHA: Downloadable full-text data (385 million words, 115,000 texts)

This announcement is for those who are interested in historical corpora and who may want a large dataset to work with on their own machine. This is a real corpus, rather than just n-grams (as with the Google Books n-grams; see a comparison at http://googlebooks.byu.edu/compare-googleBooks.asp).


We are pleased to announce that the Corpus of Historical American English <http://corpus.byu.edu/coha/> (COHA) is now available in downloadable full-text format http://corpus.byu.edu/full-text/, for use on your own computer.  COHA joins COCA http://corpus.byu.edu/coca/  and GloWbE<http://corpus.byu.edu/glowbe/>, which have been available in downloadable full-text format<http://corpus.byu.edu/full-text/> since March 2014.

The downloadable version of COHA contains 385 million words<http://corpus.byu.edu/full-text/coha_full_text.asp> of text in more than 115,000 separate texts http://corpus.byu.edu/full-text/coha_full_text.asp , covering fiction, popular magazines, newspaper articles, and non-fiction books from the 1810s to the 2000s.

At 385 million words in size, the downloadable COHA corpus is much larger than any other structured historical corpus of English. With this large amount of data, you can carry out many types of research<http://corpus.byu.edu/coha/files/davies_corpora_2011.pdf> that would not be possible<http://corpus.byu.edu/compare-smallCorpora.asp> with much smaller 5-10 million word historical corpora of English.

The corpus is available in several formats: sentence/paragraph, PoS-tagged and lemmatized (one word per line), and for input into a relational database. Samples http://corpus.byu.edu/full-text/samples.asp  of each format (3.6 million words each) are available at the full-text website<http://corpus.byu.edu/full-text/>.

We hope that this new resource is of value to you in your research and teaching.

Mark Davies
Professor of Linguistics / Brigham Young University

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **

        Date: Mon, 9 Mar 2015 11:55:11 -0400
        From: Amanda Visconti <amandavisconti at gmail.com>
        Subject: Invitation for the Open Beta of the Infinite Ulysses Participatory Digital Edition

Dear colleagues,

Today I launched a social digital edition of James Joyce's Ulysses
 http://www.InfiniteUlysses.com as part of my doctoral dissertation. I'd
like to invite you to explore the site and share any feedback you might
have about your experience or how you might want to use such a text in the

Infinite Ulysses (InfiniteUlysses.com  http://www.InfiniteUlysses.com ) is
a "participatory" digital edition: it uses an authoritative text (the
Modernist Version Project's transcription of the 1922 Shakespeare and Co.
first printing), but allows readers of all backgrounds to highlight the
text and add annotations (interpretations, comments, and questions) with
the goal of creating a shared space of scholars and public enthusiasts
discussing the novel. A variety of filters let you customize the
annotations you see to your needs (e.g. don't show spoilers or translations
of Latin; do show definitions, instances of intertextuality or mentions of
Hamlet, and questions from other readers).

The edition is useful in the classroom, whether as a reading supplement,
assignment ("add x annotations to the first episode of the novel"), or as a
way to prep for class (e.g. remind yourself of the kinds of questions
first-time readers will have about the novel).

You may also be interested in the site as part of a digital humanities
dissertation with a unique format and methodology: design, code, user
testing, research blogging, and a final whitepaper discussing project
outcomes. I've blogged the dissertation over the course of the project at
LiteratureGeek.com  http://www.LiteratureGeek.com .

Happy to hear any feedback or answer any questions via
infiniteulysses at gmail.com!

Amanda Visconti
infiniteulysses at gmail.com
@Literature_Geek  http://www.twitter.com/literature_geek  and
LiteratureGeek.com  http://literaturegeek.com/  (research blog)
Maryland Institute for Technology in the Humanities (MITH) Winnemore
Digital Dissertation Fellow
Ph.D. Candidate, University of Maryland English Department
M.S.I. (Digital Humanities HCI Specialization), University of Michigan
School of Information

More information about the Humanist mailing list