[Humanist] 27.908 on early humanities computing

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Fri Mar 21 07:51:02 CET 2014

                 Humanist Discussion Group, Vol. 27, No. 908.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

        Date: Thu, 20 Mar 2014 21:30:22 +0000 (GMT)
        From: joeraben1 at cox.net
        Subject: "Data" on early humanities computing

The following is selected from the introductory chapter of a book I am
completing on my own efforts in humanities computing:

In September 1964 IBM organized at the same laboratory what it called a
Literary Data Processing conference, primarily, I believe now, to publicize
the project of Fr. Roberto Busa to generate a huge verbal index to the
writings of  Saint Thomas Aquinas and writers associated with him. IBM had
underwritten this  project and Fr. Busa, an Italian Jesuit professor of
linguistics, had been able to  recruit a staff of junior clergy to operate
his key punches. The paper he read at this conference  was devoted to the
problems of managing the huge database he had created. IBM had  persuaded
The New York Times to send a reporter to the conference, and in the story he
filed he chose to describe in some detail my paper on the Milton-Shelley
project. The  report of the eccentric professor who was trying to use a
computer to analyze poetry  caught the fancy of the news services, and the
story popped up in The [London] Times and a  few other major newspapers
around the world.

What impressed me most at that conference, however, was the number  of
American academics who had been invited to speak about their use of the
computer,  often to generate concordances. Such reference works had, of
course, long  antedated the computer, having originated in the Renaissance,
when the first efforts  to reconcile the disparities among the four Gospels
produced these alphabetized lists of  keywords and their immediate contexts,
from which scholars hoped to  extract the "core" of biblical truth. The
utility of such reference works  for non-biblical literature soon became
obvious, and for centuries,  dedicated students of literature, often
isolated in outposts of Empire,  whiled away their hours of enforced leisure
by copying headwords, lines  and citations onto slips which then had to be
manually alphabetized for  the printer. Such concordances already existed
for a small number of  major poets, like Milton, Shelley and Shakespeare.

Apparently unrecognized by the earlier compilers of concordances was the 
concept that by restructuring the texts they were concording into a new 
order – here, alphabetical, but potentially into many others – they were
creating a perspective radically different from the linear organization 
into which the texts had originally been organized.  A major benefit to  the
scholar of this new structure is the ability to examine all the  occurrences
of individual words out of their larger contexts but in  association with
other words almost immediately adjacent. Nascent in  this effort was the
root of what we now conceive as a text database.

Some of this vision was becoming visible to the members of the avant  garde
represented at the Literary Data Processing conference, who had  generally
taken up a program called KWIC (keyword in context) that IBM  had "bundled"
with its early computers, a program designed to facilitate  control over
scientific information. Because it selected keywords from  article titles,
it was recognized as a crude but acceptable mechanism  for literary
concordances, to the extent that Stephen M. Parrish had  begun publishing a
series for Victorian poets, and others at the  conference reported on their
work on Chaucer, Old English and other  areas of literary interest. In
hindsight it is evident that the greater  significance of these initiatives
was twofold: first, they made clear  that even in their primitive state in
the 1960s, computers could perform  functions beyond arithmetic and second,
that another dimension  of language study was available. From the beginning
signaled by this  small event would come a growing academic discipline
covering such  topics as corpus linguistics, machine translation, text
analysis and  literary databases.

Beyond the activity reported at that early conference, it became 
increasingly evident that computer-generated concordances could not only
serve immediate scholarly  needs but could also imply future applications of
expanding value. Texts  could be read non-linearly, in a variety of
dimensions, with the entire  vocabulary alphabetized, with the most common
words listed first, with  the least common words listed first, or with all
the words spelled  backwards (so their endings could be associated), and in
almost any  other manner that a scholar's imagination could conjure.
Concordances  could be constructed for non-poetic works, such as Melville's
Moby-Dick  or Freud's translated writings. Many poets of lesser rank than 
Shakespeare, Milton, and Chaucer could now be accorded the stature of  being
concorded, and even political statements could be made, as when  the
anti-Stalinist Russian Josip Mandelstam was exalted by having his  poetry
concorded. David W. Packard even constructed a concordance to  Minoan Linear
A, the undeciphered writing system of prehistoric Crete.

Looking beyond that group's accomplishment in creating the concordances  and
other tools they were reporting on, I had a vision of a newer scholarship, 
based on a melding of the approaches that had served humanities scholars for
generations  with the newer ones generated by the computer scientists who
were struggling at that  time to understand their new tool, to enlarge its
capacities. Sensing that the group  of humanists gathering for this
pioneering conference could benefit from  maintaining communication with
each other beyond this meeting, I devoted  some energy and persistence to
persuading IBM to finance what I  conceived first as a newsletter. Through
the agency of Edmond A. Bowles,  a musicologist who had decided he could
support his family more  successfully as an IBM executive than as a college
instructor, I  received a grant of $5000 (as well as a renewal in the same
amount), a  huge award at that time for an assistant professor of English
and enough  to impress my dean, who allowed me a course reduction so I could
teach  myself to be an editor.

The first issue of Computers and the Humanities: A Newsletter (CHum) 
appeared in September 1966, and immediately began to outgrow its original 
conception. In an illustration of the paradox of success following an
unplanned initiative,  people of began to submit articles, and university
libraries began to  subscribe. Within a few years, what started as a
sixteen-page pamphlet  became the standard journal in its field, with a
circulation of about  2000 in all parts of the globe, equal to that of the
scholarly journals  of major universities. Among our contributors was J.M.
Coetzee, who had  worked as a computer programmer while building his
reputation as a  novelist and who later won the Nobel Prize in Literature.
Throughout the  more than two decades that it served the scholarly
community, CHum's  policy was to present as comprehensive as possible a
depiction of the  computer's role in expanding the resources of the humanist
scholar.  Articles covered a wide spectrum of disciplines: literary and
linguistic  subjects, of course, but also also archaeology, musicology,
history, art  history, and machine translation.

By its breadth of interests, then, far from being a distraction from my 
scholarly interests, editing CHum for over twenty years was an enriching and
broadening  experience. First, it kept me in touch with the growing avant
garde of humanities  computing. By inviting papers on all humanities-related
subjects in a wide range of  disciplines, I kept myself aware of the various
methodologies being developed in every part of the  world where humanists
were braving a hostile or indifferent environment to create  the foundations
of what has become today the expanding realm of digital humanities. Not 
only did I work closely with all my contributors, laboriously copyediting
their  submissions so that their work appeared as clearly as I knew how to
make it, and so that I knew  each article intimately, but for the first few
years, when humanities computing was  still a relatively unknown activity, I
published semiannual reports on the entire global  activity under the rubric
of "Directory ofScholars Active." This feature of the journal,  based on
questionnaires mailed semiannually to a broad range of computer-oriented 
academics, has been described as perhaps its most useful feature. I also
organized  a series of national surveys, beginning with Canada (by Paul
Bratley) and  culminating in a double issue covering activities in France
(by Colette  Charpentier). 

The journal may also have helped develop a sense of community in a  thinly
populated academic specialty. Its services seem to have functioned to
establish  humanities computing as a more respectable and legitimate avenue
of scholarly  advancement than it might otherwise have been considered. In
addition, an index of the  journal's contents over the first five years
allowed newcomers to the field to rapidly  acquaint themselves with prior
contributions to their special interests. Three books were  published under
the series title Data Bases in the Humanities and Social Sciences, helpful 
in not only expanding the limits of this type of research but forming a 
historical record of that period's accomplishments. And to keep this
constituency and the general  public further informed, for several years I
published a quarterly newsletter called  SCOPE that reported the latest
developments in humanities-related software. The  publishing operation
served as a foundation for various important extensions. The  "Directory of
Scholars Active" was cumulated into a book, printed in what was then 
innovative digital composition. 

More information about the Humanist mailing list