[Humanist] 22.731 statistics for humanists
Humanist Discussion Group
willard.mccarty at mccarty.org.uk
Sun May 3 08:20:08 CEST 2009
Humanist Discussion Group, Vol. 22, No. 731.
Centre for Computing in the Humanities, King's College London
Submit to: humanist at lists.digitalhumanities.org
Date: Sat, 2 May 2009 06:44:56 -0700
From: Nathaniel Bobbitt <flautabaja at hotmail.com>
Subject: RE: [Humanist] 22.728 controlled vocabularies? statistics for humanists?
In-Reply-To: <20090502060456.62A643F6D at woodward.joyent.us>
The issue of statistics and text runs between several areas: structure (systemic), multi-dimensionality, combinatorial expressions,and patterns/variation.
Corpus linguistics looks at how language is used and use patterns. A corpus has millions of words and has multiple registers: academic, conversational, dialect,newspaper, internet, radio, etc. Corpus linguists have some statistical practices.
Pioneers (linguists) in these pursuits include:
Douglas Biber, Patrick Hanks, James Pustejovsky, and Christian Matthiessen
Currently, I am developing a new way to encode,decode, and recode the following through features based on packing, fill-in, and an optical system based on two states: 1) features 2) transition (presence, absence/evacuation, and effacement).This work thinks about the mobility of patterns in the movement of checker pieces. Such an analogy between the movement of checkers and text (language-use) grows out of Halliday and Hasan's Cohesion in English.
For methodologies on Statistics and text-analysis see:
Introductory Materials: Biber, D. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.
Biber, D., S. Conrad, and R. Reppen. 1998. Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.
Conrad, S., and D. Biber (eds.). 2001. Variation in English: Multi-Dimensional studies. London: Longman.
See Multidimensional Analysis related papers at: http://jan.ucc.nau.edu/~biber/journal.htm
Biber, D. 2004. Conversation text types: A multi-dimensional analysis. In Gérald Purnelle, Cédrick Fairon, and Anne Dister (eds.), Le poids des mots: Proceedings of the 7th International Conference on the Statistical Analysis of Textual Data, 15-34. Louvain: Presses universitaires de Louvain.
Biber, D. 2003. Variation among university spoken and written registers: A new multi-dimensional analysis. In Charles Meyer and Pepi Leistyna (eds.), Corpus analysis: Language structure and language use, 47-70. Amsterdam: Rodopi.
Generative/ Combinatorial Methods
Hanks, Patrick, and James Pustejovsky. 2005. "A Pattern Dictionary for Natural Language Processing" in Revue Francaise de linguistique appliquée, 10:2.
Hanks, Patrick. 2008. "Mapping meaning onto use: a Pattern Dictionary of English Verbs". AACL 2008, Utah. (slides)
Pustejovsky, James. 1995. The Generative Lexicon. MIT Press.
Matthiessen, Christian M.I.M. 1995. Lexicogrammatical Cartography: English Systems. xviii + 978 pp. Tokyo, Taipei & Dallas: International Language Sciences Publishers.
One obvious application of corpus linguistics is world english in poetics.http://www.world-english.org/listening.htm
Lexicogrammatical Cartography: English Systems
Simple software you can explore:
Note it uses American/British English, conversational english, radio transcripts.Type a word or a phrase you will see samples from the corpus that shows you common uses.
Surf and explore an actual corpus:
There is a five minute tour link look for the following at the bottom of the text on the right hand frame:"Please feel free to take a five minute guided tour, which will show the major features of the corpus. A simple click for each query will automatically fill in the form for you, search through the 385 million words of text, and then display the results."
The following will show you how to use corpus linguistics to develop teaching materials. Here is a general (non-technical) introductory book: http://www.amazon.com/Corpus-Classroom-Language-Teaching-Linguistics/dp/0521616867/ref=pd_sim_b_4
Nat Bobbitt Portland,OR
More information about the Humanist