[Humanist] 22.731 statistics for humanists

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Sun May 3 08:20:08 CEST 2009

                 Humanist Discussion Group, Vol. 22, No. 731.
         Centre for Computing in the Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

        Date: Sat, 2 May 2009 06:44:56 -0700
        From: Nathaniel Bobbitt <flautabaja at hotmail.com>
        Subject: RE: [Humanist] 22.728 controlled vocabularies? statistics for humanists?
        In-Reply-To: <20090502060456.62A643F6D at woodward.joyent.us>

The issue of statistics and text runs between several areas: structure (systemic), multi-dimensionality, combinatorial expressions,and patterns/variation.
Corpus linguistics looks at how language is used and use patterns. A corpus has millions of words and has multiple registers: academic, conversational, dialect,newspaper, internet, radio, etc. Corpus linguists have some statistical practices.

Pioneers (linguists) in these pursuits include:
Douglas Biber, Patrick Hanks, James Pustejovsky, and Christian Matthiessen
Currently, I am developing a new way to encode,decode, and recode the following through features based on packing, fill-in, and an optical system based on two states: 1) features 2) transition (presence, absence/evacuation, and effacement).This work thinks about the mobility of patterns in the movement of checker pieces. Such an analogy between the movement of checkers and text (language-use) grows out of Halliday and Hasan's Cohesion in English.

For methodologies on Statistics and text-analysis see:

Corpus Linguistics

Introductory Materials: Biber, D.  1988.  Variation across speech and writing.  Cambridge: Cambridge University Press. 

Biber, D., S. Conrad, and R. Reppen.  1998.  Corpus linguistics:  Investigating language structure and use.  Cambridge: Cambridge University Press.

Conrad, S., and D. Biber (eds.).  2001.  Variation in English: Multi-Dimensional studies.  London:  Longman.


See Multidimensional Analysis related papers at: http://jan.ucc.nau.edu/~biber/journal.htm

Biber, D.  2004.  Conversation text types:  A multi-dimensional analysis.  In Gérald Purnelle, Cédrick Fairon, and Anne Dister (eds.), Le poids des mots:  Proceedings of the 7th International Conference on the Statistical Analysis of Textual Data, 15-34.  Louvain:  Presses universitaires de Louvain.
Biber, D.  2003.  Variation among university spoken and written registers:  A new multi-dimensional analysis.  In Charles Meyer and Pepi Leistyna (eds.), Corpus analysis: Language structure and language use, 47-70.  Amsterdam: Rodopi.

Generative/ Combinatorial Methods

Patrick Hankshttp://nlp.fi.muni.cz/projekty/cpa/

Hanks, Patrick, and James Pustejovsky. 2005. "A Pattern Dictionary for Natural Language Processing" in Revue Francaise de linguistique appliquée, 10:2.

Hanks, Patrick. 2008. "Mapping meaning onto use: a Pattern Dictionary of English Verbs". AACL 2008, Utah. (slides)

Pustejovsky, James. 1995. The Generative Lexicon. MIT Press.

Matthiessen, Christian M.I.M. 1995. Lexicogrammatical Cartography: English Systems. xviii + 978 pp. Tokyo, Taipei & Dallas: International Language Sciences Publishers.

One obvious application of corpus linguistics is world english in poetics.http://www.world-english.org/listening.htm


Lexicogrammatical Cartography:  English Systems


Simple software you can explore:


Note it uses American/British English, conversational english, radio transcripts.Type a word or a phrase you will see samples from the corpus that shows you common uses.

Surf and explore an actual corpus:


There is a five minute tour link look for the following at the bottom of the text on the right hand frame:"Please feel free to take a five minute guided tour, which will show the major features of the corpus.  A simple click for each query will automatically fill in the form for you, search through the 385 million words of text, and then display the results."

The following will show you how to use corpus linguistics to develop teaching materials. Here is a general (non-technical) introductory book: http://www.amazon.com/Corpus-Classroom-Language-Teaching-Linguistics/dp/0521616867/ref=pd_sim_b_4

Nat Bobbitt Portland,OR

More information about the Humanist mailing list