[Humanist] 29.121 n-grams

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Fri Jun 26 03:33:42 CEST 2015


                 Humanist Discussion Group, Vol. 29, No. 121.
            Department of Digital Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist at lists.digitalhumanities.org



        Date: Thu, 25 Jun 2015 00:02:55 +0100
        From: "James O'Sullivan" <josullivan.c at gmail.com>
        Subject: Re:  29.97 n-grams: a swarm of uses & discussions
        In-Reply-To: <20150615203256.B5C7BC7F at digitalhumanities.org>


Thanks all for your suggestions - these have been very useful.

Best regards,
James

On Mon, Jun 15, 2015 at 9:32 PM, Humanist Discussion Group <
willard.mccarty at mccarty.org.uk> wrote:

>                   Humanist Discussion Group, Vol. 29, No. 97.
>             Department of Digital Humanities, King's College London
>                        www.digitalhumanities.org/humanist
>                 Submit to: humanist at lists.digitalhumanities.org
>
>   [1]   From:    Andrew Prescott <Andrew.Prescott at glasgow.ac.uk>
>  (36)
>         Subject: Re:  29.94 uses of n-grams?
>
>   [2]   From:    Martin Mueller <martinmueller at northwestern.edu>
>  (57)
>         Subject: Re:  29.94 uses of n-grams?
>
>   [3]   From:    maurizio lana <maurizio.lana at gmail.com>
>  (36)
>         Subject: Re:  29.94 uses of n-grams?
>
>   [4]   From:    David Williams <david.williams at uwaterloo.ca>
>   (42)
>         Subject: Re:  29.94 uses of n-grams?
>
>   [5]   From:    "Center for Comparative Studies"
> (25)
>                 <centrostudicomparati at libero.it>
>         Subject: Re:  29.94 uses of n-grams?
>
>   [6]   From:    "Liddle, Dallas" <liddle at augsburg.edu>
>   (48)
>         Subject: Re:  29.94 uses of n-grams?
>
>
>
> --[1]------------------------------------------------------------------------
>         Date: Fri, 12 Jun 2015 12:00:53 +0000
>         From: Andrew Prescott <Andrew.Prescott at glasgow.ac.uk>
>         Subject: Re:  29.94 uses of n-grams?
>         In-Reply-To: <20150612110523.6F7FF9B9 at digitalhumanities.org>
>
>
> Andreas Jucker, Irma Taavitsainen and Gerold Schneider, "Semantic corpus
> trawling: Expressions of “courtesy” and “politeness” in the Helsinki
> Corpus”, Varieng: Studies in Variation, Contacts and Change in English 11
> (2012), available at
> http://www.helsinki.fi/varieng/series/volumes/11/jucker_taavitsainen_schneider/
> makes use of Google N-gram in studying chronological shifts in cultural
> constructions of politeness. However, the addendum to the article reveals
> some of the hazards of using the Google N-Gram viewer. It was found that
> some of the shifts in word use indicated by Google N-Gram were due to the
> decline of the use of the long ‘f’ and were thus typographical artefacts
> rather than cultural changes. When an attempt was made to recalculate the
> results, it was found that Google had changed its algorithm, so that the
> original results could not be repeated.
>
> A Conversation with Data: Prospecting Victorian Words and Ideas
> Gibbs, Frederick W; Cohen, Daniel J. Victorian Studies54.1 (Autumn 2011):
> 69-77,185.
>
> And there’s also Daniel Rosenberg’s reflective study ‘Data before the
> Fact’ available at:
> http://pages.uoregon.edu/koopman/courses_readings/colt607/rosenberg_data-before-fact_proofs.pdf
>
>
> Andrew Prescott FSA FRHistS
> Professor of Digital Humanities
> AHRC Theme Leader Fellow for Digital Transformations
> University of Glasgow
>
> andrew.prescott at glasgow.ac.uk<mailto:andrew.prescott at glasgow.ac.uk>
> @ajprescott
> 07743895209

> --[2]------------------------------------------------------------------------
>         Date: Fri, 12 Jun 2015 14:17:21 +0000
>         From: Martin Mueller <martinmueller at northwestern.edu>
>         Subject: Re:  29.94 uses of n-grams?
>         In-Reply-To: <20150612110523.6F7FF9B9 at digitalhumanities.org>
>
>
> Broadly speaking, n-gram analysis has been a central feature of Homeric
> scholarship at least since the days of Friedrich Wolf more than two
> centuries ago. The scrupulous listing of repeated n-grams makes the
> 19th=century commentaries of Ameis-Hentze a still useful tool.  The
> Chicago Homer has been a digital tool drawing attention to the distinctive
> features of Homeric repetition. (http://homer.library.northwestern.edu)
>
> I have played around with a large list of repeated n-grams extracted from
> a corpus of ~500 plays from the mid-sixteenth to the
> mid-seventeenth-century. It's an interesting data set.  The most striking,
> but hardly surprising, conclusion is that works by the same author on
> average share twice as many n-grams as works by different authors. On the
> other hand, n-grams hardly ever provide conclusive evidence that C is the
> author of A and B, where A and B are plays of unknown or disputed
> authorship. From a forensic perspective, n-grams provide intriguing but
> frustrating evidence.
>
> I have a single and abstract measure of repetition, by which the average
> value for pairwise combinations of plays by the same author is 64.7 while
> the comparable figure for plays by different authors is 28.7.  The average
> value for 666 pairwise combinations of Shakespeare plays is 52.6 (he
> repeats himself a lot less than James Shirley), but the values for
> particular pairwise combination range from 20.99 to
> 145.3.
>
>  Karl Reinhardt argued many years ago that the Aphrodite Hymn was the work
> of Homer. If you count shared n-grams, the Aphrodite Hymn is the only
> Homeric Hymn that sits sqarely within the range of shared n-grams (and
> other quantitative data)  for pairwise combinations of Homeric books. The
> others are all outliers. But it doesn't add up to conclusive proof.
>
>
>
> --[3]------------------------------------------------------------------------
>         Date: Fri, 12 Jun 2015 17:01:34 +0200
>         From: maurizio lana <maurizio.lana at gmail.com>
>         Subject: Re:  29.94 uses of n-grams?
>         In-Reply-To: <20150612110523.6F7FF9B9 at digitalhumanities.org>
>
>
> Il 12/06/15 13:05, Humanist Discussion Group ha scritto:
> > Can anyone point me to some good examples of literary scholarship that -
> > broadly speaking - avail of n-grams?
>
> first of all the term n-grams can be taken literally, as referring to
> characters, or broadly, as referring to words.
>
> in the second sense an interesting work of authorship attribution on
> newspaper articles possibly written by a. gramsci was done in the last
> years by me with a group of mathematical physicists - mirko degli
> esposti, b. benedetto, m. caglioti on behalf of Fondazione Istituto
> Gramsci in order to find new evidences of gramsci's texts to be
> published in the national edition of his writings. specific repeating
> sequences of words (n-grams) were investigated and tested, and then
> used, as a "working marker" of authorship.
> see
>
> http://www.ledonline.it/informatica-umanistica/Allegati/IU-03-10-Lana.pdf,
> www.infotext.unisi.it/upload/gramsci.ppt,
> http://www.assiterm91.it/wp-content/uploads/2010/11/Convegno-2008.pdf
> (pages 165-183)
>
> Dario Benedetto, Mirko Degli Esposti, Giulio Maspero, The Puzzle of
> Basil's Epistula 38: A Mathematical Approach to a Philological Problem
> In: Journal of Quantitative Linguistics, Vol. 20, Iss. 4, (2013)
>
> A. Barron-Cedeno, C. Basile, M. Degli Esposti, P. Rosso, /Word Length
> n-Grams for Text Re-use Detection/ In: LECTURE NOTES IN COMPUTER SCIENCE
> (ISSN:0302-9743), (pp. 687- 699) (2010)
>
> C. Basile, D. Benedetto, E. Caglioti, G. Cristadoro, M. Degli Esposti,
> /A plagiarism detection procedure in three steps: selection, matches and
> 'squares'./ In: Proceedings of the SEPLN'09 Workshop on Uncovering
> Plagiarism, Authorship and Social Software Misuse. sine nomine, SINE
> LOCO: (pp. 19- 24). September 10, San Sebastian (spain) (2010)
>
> best
> maurizio
>
> -------
> Maurizio Lana
> Università del Piemonte Orientale, Dipartimento di Studi Umanistici
> piazza Roma 36 - 13100 Vercelli
> tel. +39 347 7370925
>
>
>
>
> --[4]------------------------------------------------------------------------
>         Date: Fri, 12 Jun 2015 11:32:26 -0400
>         From: David Williams <david.williams at uwaterloo.ca>
>         Subject: Re:  29.94 uses of n-grams?
>         In-Reply-To: <20150612110523.6F7FF9B9 at digitalhumanities.org>
>
> Dear James,
>
> I'm not sure what level of publication you intend, but I've used n-grams
> in several blog posts discussing literary questions over the last few
> years. Most have to do with poetic diction, neologism, interpretation,
> and the history of literary criticism (though there are also posts on
> broader questions of language, and also on the uses and pitfalls of the
> google dataset, which are probably not what you are looking for). Tag
> archives are at:
>
> http://poetry-contingency.uwaterloo.ca/tag/n-grams/
> http://thelifeofwords.uwaterloo.ca/tag/n-grams/
>
> Yrs
> David Williams
>
> --
> David-Antoine Williams, DPhil MPhil
> Assistant Professor
> Department of English
> University of Waterloo
> Waterloo | ON | N2L 3G3
> p: +1 519 884.8111 x28287
> f: +1 519 884.5759
> http://thelifeofwords.uwaterloo.ca
>
>
>
> --[5]------------------------------------------------------------------------
>         Date: Sun, 14 Jun 2015 17:39:38 +0200
>         From: "Center for Comparative Studies" <
> centrostudicomparati at libero.it>
>         Subject: Re:  29.94 uses of n-grams?
>         In-Reply-To: <20150612110523.6F7FF9B9 at digitalhumanities.org>
>
>
> You can find some informations in: R. Clement and D. Sharp, Ngram and
> Bayesian Classification of Documents for
> Topic and Authorship, "LLC", 2003, 18(4):423-447;  P. Juola, Authorship
> Attribution, "Foundations and Trends in Information Retrieval", Vol. 1, No.
> 3 (2006) 233-334 and J. Grieve, Quantitative Authorship Attribution: An
> Evaluation of Techniques, LLC 22: 251-270.
>
> If you can read Italian, the applications of such methods to some texts
> attributed to Antonio Gramsci in a research leaded by Maurizio Lana are
> explained in:
>
> C. Basile, D. Benedetto, E. Caglioti, M. Degli Esposti, An example of
> mathematical authorship attribution, "Journal Of Mathematical Physics",
> 2008, 49, pp. 1 - 20;  C. Basile, D. Benedetto, E. Caglioti, M. Degli
> Esposti, L'attribuzione dei testi gramsciani: metodi e modelli matematici,
> "La Matematica nella Società e nella Cultura", 2010, 3, pp. 235 - 269; M.
> Lana, Come scriveva Gramsci? Metodi matematici per riconoscere scritti
> gramsciani anonimi, "Informatica Umanistica", 2010, 3, 31-56.
> Recent applications to Montale's "Diario postumo" has been made by Federico
> Condello in a book published some months ago in Italian: E' di
> EugenioMontale il "Diario postumo"?, Bologna (Bononia University Press)
> 2014.
>
> Best
> Francesco Stella
>
> ----- Original Message -----
> From: "Humanist Discussion Group" <willard.mccarty at mccarty.org.uk>
> To: <humanist at lists.digitalhumanities.org>
> Sent: Friday, June 12, 2015 1:05 PM
>
>
> --[6]------------------------------------------------------------------------
>         Date: Mon, 15 Jun 2015 14:45:17 -0500
>         From: "Liddle, Dallas" <liddle at augsburg.edu>
>         Subject: Re:  29.94 uses of n-grams?
>         In-Reply-To: <20150612110523.6F7FF9B9 at digitalhumanities.org>
>
>
> For the query a few days back about literary scholarship that uses n-grams,
> Bettina Fischer-Starcke has an article, "Keywords and Frequent Phrases of
> Jane Austen's Pride and Prejudice: A corpus-stylistic analysis," in
> *International Journal of Corpus Linguistics *14.4 (2009), 492-523, that
> uses n-gram language specifically. Dr. Fischer Starcke also has a book:
> *Corpus Linguistics in Literary Analysis: Jane Austen and Her
> Contemporaries*, 2010 from Bloomsbury Academic.
>
> Best,
> DL
>
> ****************
> Dallas Liddle, Ph.D.
> Associate Professor and Chair of English
> Augsburg College
> 2211 Riverside Ave.
> Minneapolis, MN 55454
> Office: 612 330 1295
> Fax: 612 330 1699
> liddle at augsburg.edu
>

-- 
*James O'Sullivan *
@jamescosullivan  http://twitter.com/jamescosullivan
Web: josullivan.org

New Binary Press: http://newbinarypress.com
 http://newbinarypress.com/Bookstore.html





More information about the Humanist mailing list