[Humanist] 22.728 controlled vocabularies? statistics for humanists?

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Sat May 2 08:04:56 CEST 2009

                 Humanist Discussion Group, Vol. 22, No. 728.
         Centre for Computing in the Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    "James R. Kelly" <jrkelly at library.umass.edu>              (80)
        Subject: Statistics for Humanists

  [2]   From:    Vika Zafrin <vzafrin at bu.edu>                              (21)
        Subject: Controlled vocabularies for the humanities?

        Date: Fri, 01 May 2009 07:08:33 -0400
        From: "James R. Kelly" <jrkelly at library.umass.edu>
        Subject: Statistics for Humanists

I received the following message from a colleague at the University of
Massachusetts Amherst. I find it interesting and fascinating, but other than
some vague ideas as to approaching it, I'm rather at a loss. My first thought
(hence this message) was to present it to this diverse and well-informed
collective to see what responses you might have for Bruce.

Thanks in advance,

Jim Kelly


I am writing to you at the suggestion of Jim Craig, to whom I spoke recently
about several problems currently facing our Institute (the Warring States
Project) as it prepares to launch its journal.

In one sentence: Is there a compact and competent introduction, for
humanistic scholars, to the elementary statistical procedures that are
sometimes useful in the text-based sciences? And no others?

Background: Our field is the classical Chinese texts, in which my colleague
and I have made some fundamental discoveries. One of them, publicized in
1990, had as a corollary a prediction about the nature of a certain text,
should an early 03rd century copy of it ever turn up: it should lack a
certain number of its final chapters. Three sets of extracts from that text
were archaeologically recovered from an early 03c tomb in 1993, and
published in 1998. The result exactly confirmed our prediction: the 33
extracts were drawn from chapters 2-66 of the received text, but wholly
ignored chapters 67-81. The chance of a selection from the 81-chapter text
having just this configuration (a draw without replacement problem) works
out, as I figure it, to less than 1 in 7,000.

This, I suppose, would normally be thought definitive, and would thus
produce wide acceptance of our theory of the text. It has not worked out
that way. Humanists have proved to be very capable of ignoring numerical
evidence, and this is what most of them have done in the present case.

We will be publishing our theory of that text, with a note on that empirical
confirmation, in v1 of our now imminent journal. As a desperation move, to
increase understanding of the result, or at least indicate the possibility
of such understanding, we are presently planning to include at the back of
that volume a tiny (4p) primer on the way to calculate and interpret
draw-without-replacement situations. In subsequent volumes, other basic
numerically solvable situations in Chinese (and Greek) texts will be pointed
out, and we are prepared to add similar primer-like sections at the back of
*those* volumes, in order to show that a technique for these things does
exist, and is not all that hard to follow, and that it constitutes a useful
addition to the toolkit of the working Sinologist.

If those sections are in fact necessary, as it currently seems to us they
are, then there is a general gap in this area, and it might make sense to
cumulate our little primer pages at some point into a pamphlet-sized guide
to humanistically solvable text problems. I don't want to do this; I have
other things to attend to. I would rather, from the outset, simply refer
readers to an already existing handbook of this sort.

Hence the question: Does anything of the kind now exist, in recommendable

I haven't come across anything of the sort myself. The few attempts of which
I know, to introduce historians or other text-based humanists to statistical
procedures, have been quietly unsuccessful. I have asked others at UMass and
elsewhere, but so far nothing useful has turned up. Jim pointed me to the
proper shelves in the Science and Engineering Library, but again nothing
there or in the 5C catalog under keywords "humanities" and "statistics"
seemed suitable, and the shelflist at D16.17 also yielded nothing that would
well serve the present purpose. If you can recommend something, we will be
very grateful.

Thanks and best wishes,


E Bruce Brooks
Research Professor of Chinese
Warring States Project
University of Massachusetts at Amherst

----- End forwarded message -----

James R. Kelly
Humanities Bibliographer
W.E.B. Du Bois Library
University of Massachusetts
154 Hicks Way
Amherst, MA 01003-9275

(413) 545-3981; (413) 577-2565 (fax)
E-mail: jrkelly at library.umass.edu

American Co-Editor, Annual Bibliography of English Language and Literature;
Section Head & Senior Bibliographer, MLA International Bibliography; Adjunct
faculty: UMass German & Scandinavian Studies, Simmons College Graduate School of
Library and Information Science and URI Graduate School of Library and
Information Studies; Research Librarian, Mass. Ctr. for Renaissance Studies;
Slavic Cataloger, Amherst College; Humanities Co-Editor, Guide to Reference

        Date: Fri, 1 May 2009 11:16:02 -0400
        From: Vika Zafrin <vzafrin at bu.edu>
        Subject: Controlled vocabularies for the humanities?


I'm trying to ascertain the existence of specific controlled vocabularies in
the humanities and social sciences (and/or fields therein).  I'm
particularly interested in those used with institutional repositories, if
they exist.  If you know of controlled vocabularies that are widely accepted
and used in the relevant fields outside of IRs (in journals, for example),
I'd appreciate pointers to those too.

I suspect that if such vocabularies exist, they're based on something
pre-existing -- the Library of Congress subject headings, perhaps.  Useful
as LoC is, what I'm looking for are current computational uses of such
indices/taxonomies.  The equivalent of the vocabularies used in, say,
medicine to parse and auto-keyword published articles.  For example.

Many thanks,

Vika Zafrin
Digital Collections and Computing Support Librarian
Boston University School of Theology
745 Commonwealth Avenue
Boston, MA 02215

More information about the Humanist mailing list