[Humanist] 29.97 n-grams: a swarm of uses & discussions

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Mon Jun 15 22:32:56 CEST 2015


                  Humanist Discussion Group, Vol. 29, No. 97.
            Department of Digital Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    Andrew Prescott <Andrew.Prescott at glasgow.ac.uk>           (36)
        Subject: Re:  29.94 uses of n-grams?

  [2]   From:    Martin Mueller <martinmueller at northwestern.edu>           (57)
        Subject: Re:  29.94 uses of n-grams?

  [3]   From:    maurizio lana <maurizio.lana at gmail.com>                   (36)
        Subject: Re:  29.94 uses of n-grams?

  [4]   From:    David Williams <david.williams at uwaterloo.ca>              (42)
        Subject: Re:  29.94 uses of n-grams?

  [5]   From:    "Center for Comparative Studies"                          (25)
                <centrostudicomparati at libero.it>
        Subject: Re:  29.94 uses of n-grams?

  [6]   From:    "Liddle, Dallas" <liddle at augsburg.edu>                    (48)
        Subject: Re:  29.94 uses of n-grams?


--[1]------------------------------------------------------------------------
        Date: Fri, 12 Jun 2015 12:00:53 +0000
        From: Andrew Prescott <Andrew.Prescott at glasgow.ac.uk>
        Subject: Re:  29.94 uses of n-grams?
        In-Reply-To: <20150612110523.6F7FF9B9 at digitalhumanities.org>


Andreas Jucker, Irma Taavitsainen and Gerold Schneider, "Semantic corpus trawling: Expressions of “courtesy” and “politeness” in the Helsinki Corpus”, Varieng: Studies in Variation, Contacts and Change in English 11 (2012), available at http://www.helsinki.fi/varieng/series/volumes/11/jucker_taavitsainen_schneider/ makes use of Google N-gram in studying chronological shifts in cultural constructions of politeness. However, the addendum to the article reveals some of the hazards of using the Google N-Gram viewer. It was found that some of the shifts in word use indicated by Google N-Gram were due to the decline of the use of the long ‘f’ and were thus typographical artefacts rather than cultural changes. When an attempt was made to recalculate the results, it was found that Google had changed its algorithm, so that the original results could not be repeated.

A Conversation with Data: Prospecting Victorian Words and Ideas
Gibbs, Frederick W; Cohen, Daniel J. Victorian Studies54.1 (Autumn 2011): 69-77,185.

And there’s also Daniel Rosenberg’s reflective study ‘Data before the Fact’ available at: http://pages.uoregon.edu/koopman/courses_readings/colt607/rosenberg_data-before-fact_proofs.pdf


Andrew Prescott FSA FRHistS
Professor of Digital Humanities
AHRC Theme Leader Fellow for Digital Transformations
University of Glasgow

andrew.prescott at glasgow.ac.uk<mailto:andrew.prescott at glasgow.ac.uk>
@ajprescott
07743895209





On 12 Jun 2015, at 12:05, Humanist Discussion Group <willard.mccarty at mccarty.org.uk<mailto:willard.mccarty at mccarty.org.uk>> wrote:

                 Humanist Discussion Group, Vol. 29, No. 94.
           Department of Digital Humanities, King's College London
                      www.digitalhumanities.org/humanist<http://www.digitalhumanities.org/humanist>
               Submit to: humanist at lists.digitalhumanities.org<mailto:humanist at lists.digitalhumanities.org>



       Date: Thu, 11 Jun 2015 17:06:30 -0700
       From: "James O'Sullivan" <josullivan.c at gmail.com<mailto:josullivan.c at gmail.com>>
       Subject: Examples of lit scholarship using n-grams


Dear all,

Can anyone point me to some good examples of literary scholarship that -
broadly speaking - avail of n-grams?

Sincerest thanks in advance,
James

--
*James O'Sullivan *
@jamescosullivan  http://twitter.com/jamescosullivan
Web: josullivan.org<http://josullivan.org>

New Binary Press: http://newbinarypress.com
http://newbinarypress.com/Bookstore.html





--[2]------------------------------------------------------------------------
        Date: Fri, 12 Jun 2015 14:17:21 +0000
        From: Martin Mueller <martinmueller at northwestern.edu>
        Subject: Re:  29.94 uses of n-grams?
        In-Reply-To: <20150612110523.6F7FF9B9 at digitalhumanities.org>


Broadly speaking, n-gram analysis has been a central feature of Homeric
scholarship at least since the days of Friedrich Wolf more than two
centuries ago. The scrupulous listing of repeated n-grams makes the
19th=century commentaries of Ameis-Hentze a still useful tool.  The
Chicago Homer has been a digital tool drawing attention to the distinctive
features of Homeric repetition. (http://homer.library.northwestern.edu)

I have played around with a large list of repeated n-grams extracted from
a corpus of ~500 plays from the mid-sixteenth to the
mid-seventeenth-century. It's an interesting data set.  The most striking,
but hardly surprising, conclusion is that works by the same author on
average share twice as many n-grams as works by different authors. On the
other hand, n-grams hardly ever provide conclusive evidence that C is the
author of A and B, where A and B are plays of unknown or disputed
authorship. From a forensic perspective, n-grams provide intriguing but
frustrating evidence.

I have a single and abstract measure of repetition, by which the average
value for pairwise combinations of plays by the same author is 64.7 while
the comparable figure for plays by different authors is 28.7.  The average
value for 666 pairwise combinations of Shakespeare plays is 52.6 (he
repeats himself a lot less than James Shirley), but the values for
particular pairwise combination range from 20.99 to
145.3.  

 Karl Reinhardt argued many years ago that the Aphrodite Hymn was the work
of Homer. If you count shared n-grams, the Aphrodite Hymn is the only
Homeric Hymn that sits sqarely within the range of shared n-grams (and
other quantitative data)  for pairwise combinations of Homeric books. The
others are all outliers. But it doesn't add up to conclusive proof.

On 6/12/15, 6:05 AM, "Humanist Discussion Group"
<willard.mccarty at mccarty.org.uk> wrote:

>                  Humanist Discussion Group, Vol. 29, No. 94.
>            Department of Digital Humanities, King's College London
>                       www.digitalhumanities.org/humanist
>                Submit to: humanist at lists.digitalhumanities.org
>
>
>
>        Date: Thu, 11 Jun 2015 17:06:30 -0700
>        From: "James O'Sullivan" <josullivan.c at gmail.com>
>        Subject: Examples of lit scholarship using n-grams
>
>
>Dear all,
>
>Can anyone point me to some good examples of literary scholarship that -
>broadly speaking - avail of n-grams?
>
>Sincerest thanks in advance,
>James
>
>-- 
>*James O'Sullivan *
>@jamescosullivan  http://twitter.com/jamescosullivan
>Web: josullivan.org
>
>New Binary Press: http://newbinarypress.com
> http://newbinarypress.com/Bookstore.html




--[3]------------------------------------------------------------------------
        Date: Fri, 12 Jun 2015 17:01:34 +0200
        From: maurizio lana <maurizio.lana at gmail.com>
        Subject: Re:  29.94 uses of n-grams?
        In-Reply-To: <20150612110523.6F7FF9B9 at digitalhumanities.org>


Il 12/06/15 13:05, Humanist Discussion Group ha scritto:
> Can anyone point me to some good examples of literary scholarship that -
> broadly speaking - avail of n-grams?

first of all the term n-grams can be taken literally, as referring to 
characters, or broadly, as referring to words.

in the second sense an interesting work of authorship attribution on 
newspaper articles possibly written by a. gramsci was done in the last 
years by me with a group of mathematical physicists - mirko degli 
esposti, b. benedetto, m. caglioti on behalf of Fondazione Istituto 
Gramsci in order to find new evidences of gramsci's texts to be 
published in the national edition of his writings. specific repeating 
sequences of words (n-grams) were investigated and tested, and then 
used, as a "working marker" of authorship.
see 

http://www.ledonline.it/informatica-umanistica/Allegati/IU-03-10-Lana.pdf, 
www.infotext.unisi.it/upload/gramsci.ppt, 
http://www.assiterm91.it/wp-content/uploads/2010/11/Convegno-2008.pdf 
(pages 165-183)

Dario Benedetto, Mirko Degli Esposti, Giulio Maspero, The Puzzle of 
Basil's Epistula 38: A Mathematical Approach to a Philological Problem 
In: Journal of Quantitative Linguistics, Vol. 20, Iss. 4, (2013)

A. Barron-Cedeno, C. Basile, M. Degli Esposti, P. Rosso, /Word Length 
n-Grams for Text Re-use Detection/ In: LECTURE NOTES IN COMPUTER SCIENCE 
(ISSN:0302-9743), (pp. 687- 699) (2010)

C. Basile, D. Benedetto, E. Caglioti, G. Cristadoro, M. Degli Esposti, 
/A plagiarism detection procedure in three steps: selection, matches and 
'squares'./ In: Proceedings of the SEPLN'09 Workshop on Uncovering 
Plagiarism, Authorship and Social Software Misuse. sine nomine, SINE 
LOCO: (pp. 19- 24). September 10, San Sebastian (spain) (2010)

best
maurizio

-------
Maurizio Lana
Università del Piemonte Orientale, Dipartimento di Studi Umanistici
piazza Roma 36 - 13100 Vercelli
tel. +39 347 7370925



--[4]------------------------------------------------------------------------
        Date: Fri, 12 Jun 2015 11:32:26 -0400
        From: David Williams <david.williams at uwaterloo.ca>
        Subject: Re:  29.94 uses of n-grams?
        In-Reply-To: <20150612110523.6F7FF9B9 at digitalhumanities.org>

Dear James,

I'm not sure what level of publication you intend, but I've used n-grams 
in several blog posts discussing literary questions over the last few 
years. Most have to do with poetic diction, neologism, interpretation, 
and the history of literary criticism (though there are also posts on 
broader questions of language, and also on the uses and pitfalls of the 
google dataset, which are probably not what you are looking for). Tag 
archives are at:

http://poetry-contingency.uwaterloo.ca/tag/n-grams/
http://thelifeofwords.uwaterloo.ca/tag/n-grams/

Yrs
David Williams

-- 
David-Antoine Williams, DPhil MPhil
Assistant Professor
Department of English
University of Waterloo
Waterloo | ON | N2L 3G3
p: +1 519 884.8111 x28287
f: +1 519 884.5759
http://thelifeofwords.uwaterloo.ca

On 12-Jun-15 7:05, Humanist Discussion Group wrote:
>                    Humanist Discussion Group, Vol. 29, No. 94.
>              Department of Digital Humanities, King's College London
>                         www.digitalhumanities.org/humanist
>                  Submit to: humanist at lists.digitalhumanities.org
>
>
>
>          Date: Thu, 11 Jun 2015 17:06:30 -0700
>          From: "James O'Sullivan" <josullivan.c at gmail.com>
>          Subject: Examples of lit scholarship using n-grams
>
>
> Dear all,
>
> Can anyone point me to some good examples of literary scholarship that -
> broadly speaking - avail of n-grams?
>
> Sincerest thanks in advance,
> James
>



--[5]------------------------------------------------------------------------
        Date: Sun, 14 Jun 2015 17:39:38 +0200
        From: "Center for Comparative Studies" <centrostudicomparati at libero.it>
        Subject: Re:  29.94 uses of n-grams?
        In-Reply-To: <20150612110523.6F7FF9B9 at digitalhumanities.org>


You can find some informations in: R. Clement and D. Sharp, Ngram and 
Bayesian Classification of Documents for
Topic and Authorship, "LLC", 2003, 18(4):423-447;  P. Juola, Authorship 
Attribution, "Foundations and Trends in Information Retrieval", Vol. 1, No. 
3 (2006) 233-334 and J. Grieve, Quantitative Authorship Attribution: An 
Evaluation of Techniques, LLC 22: 251-270.

If you can read Italian, the applications of such methods to some texts 
attributed to Antonio Gramsci in a research leaded by Maurizio Lana are 
explained in:

C. Basile, D. Benedetto, E. Caglioti, M. Degli Esposti, An example of 
mathematical authorship attribution, "Journal Of Mathematical Physics", 
2008, 49, pp. 1 - 20;  C. Basile, D. Benedetto, E. Caglioti, M. Degli 
Esposti, L'attribuzione dei testi gramsciani: metodi e modelli matematici, 
"La Matematica nella Società e nella Cultura", 2010, 3, pp. 235 - 269; M. 
Lana, Come scriveva Gramsci? Metodi matematici per riconoscere scritti 
gramsciani anonimi, "Informatica Umanistica", 2010, 3, 31-56.
Recent applications to Montale's "Diario postumo" has been made by Federico 
Condello in a book published some months ago in Italian: E' di 
EugenioMontale il "Diario postumo"?, Bologna (Bononia University Press) 
2014.

Best
Francesco Stella

----- Original Message ----- 
From: "Humanist Discussion Group" <willard.mccarty at mccarty.org.uk>
To: <humanist at lists.digitalhumanities.org>
Sent: Friday, June 12, 2015 1:05 PM

--[6]------------------------------------------------------------------------
        Date: Mon, 15 Jun 2015 14:45:17 -0500
        From: "Liddle, Dallas" <liddle at augsburg.edu>
        Subject: Re:  29.94 uses of n-grams?
        In-Reply-To: <20150612110523.6F7FF9B9 at digitalhumanities.org>


For the query a few days back about literary scholarship that uses n-grams,
Bettina Fischer-Starcke has an article, "Keywords and Frequent Phrases of
Jane Austen's Pride and Prejudice: A corpus-stylistic analysis," in
*International Journal of Corpus Linguistics *14.4 (2009), 492-523, that
uses n-gram language specifically. Dr. Fischer Starcke also has a book:
*Corpus Linguistics in Literary Analysis: Jane Austen and Her
Contemporaries*, 2010 from Bloomsbury Academic.

Best,
DL

****************
Dallas Liddle, Ph.D.
Associate Professor and Chair of English
Augsburg College
2211 Riverside Ave.
Minneapolis, MN 55454
Office: 612 330 1295
Fax: 612 330 1699
liddle at augsburg.edu

On Fri, Jun 12, 2015 at 6:05 AM, Humanist Discussion Group <
willard.mccarty at mccarty.org.uk> wrote:

>                   Humanist Discussion Group, Vol. 29, No. 94.
>             Department of Digital Humanities, King's College London
>                        www.digitalhumanities.org/humanist
>                 Submit to: humanist at lists.digitalhumanities.org
>
>
>
>         Date: Thu, 11 Jun 2015 17:06:30 -0700
>         From: "James O'Sullivan" <josullivan.c at gmail.com>
>         Subject: Examples of lit scholarship using n-grams
>
>
> Dear all,
>
> Can anyone point me to some good examples of literary scholarship that -
> broadly speaking - avail of n-grams?
>
> Sincerest thanks in advance,
> James
>
> --
> *James O'Sullivan *
> @jamescosullivan  http://twitter.com/jamescosullivan
> Web: josullivan.org
>
> New Binary Press: http://newbinarypress.com
>  http://newbinarypress.com/Bookstore.html







More information about the Humanist mailing list