[Humanist] 25.935 technical interests

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Fri May 4 06:56:30 CEST 2012


                 Humanist Discussion Group, Vol. 25, No. 935.
            Department of Digital Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist at lists.digitalhumanities.org



        Date: Thu, 3 May 2012 10:11:01 -0500
        From: Ben Brumfield <benwbrum at gmail.com>
        Subject: technical interests?

In 25.926, Willard McCarty asked those DHists in primarily technical
roles, "What in particular makes the digital humanities interesting?"

As a software engineer who has recently moved from industry into (big
tent) digital humanities work, I feel like I can provide a bit of an
answer.

For me the great joy of DH work comes from the fact that it is an
intersection -- a meeting point between two rich traditions which seem
to have had little prior contact.  There's so much more potential to
make an impact by making connections and drawing analogies from one
world to the other than you find in well-served fields like e-commerce
or consumer-facing social networks.

Let me give an example from my second week on a project building a
tool for transcribing old English parish registers and entering the
results into a searchable database.  The advisory committee--made of
expert volunteers who'd been doing this work offline for years--had
spilled a lot of pixels on the contentious issue of whether or not to
standardize person- and place-names when the original documents
recorded abbreviations, Latinizations, and ear-spellings.  The
arguments raged between advocates for a strict "type-what-you-see"
approach and others who emphasized the predicament of a researcher
looking for "George", but unable to find the correct person named
"Georgius" in the database.  Thanks to _A Guide to Documentary
Editing_, I learned that editors separate the transcription process
(verbatim et literatim) from the emendation process that is used to
create an edition.  Realizing this allowed me to reframe the search
database as a sort of edition -- one whose emended records may point
to transcripts and facsimiles, but whose searchable fields contain
emendations.  As a result, the transcripts will be as close to the
manuscripts as possible, while the researchers looking for "William"
will find records emended from "Wm".

That was on a Wednesday.  On Thursday I wrote a tweet about the
project's approach to recording unclear text, a notation inspired by
Regular Expressions.  This was picked up by at least a few humanities
professionals, with one person responding "I wish I'd know about this
for my dissertation!".  I can't speak to the impact that may have on
the humanities world, but the enthusiasm was heartening.

That's the thrilling part about working at this intersection -- on one
day I use an approach from scholarly editing to transform a
contentious problem in database design, while the next I expose
humanities scholars to a CS-inspired notation for recording unclear
handwriting.  That week I felt like I was bringing horses to the
Americas and returning with a shipload of potatoes for Europe.

Some of the problems are also pretty exciting from a computational
perspective.  Yesterday I wrote a blog post about the challenge of
efficiently searching a database of unclear handwriting, as it's a
really thorny problem.  In fact, I think it's the most computationally
interesting problem I've worked on in at least three years:
http://tinyurl.com/SearchingRegexRecords

I hope that that's useful.  I'd be curious to hear from other
programmers who have moved in the same direction, as I gather that it
may be rare.

Ben Brumfield
http://manuscripttranscription.blogspot.com/
@benwbrum





More information about the Humanist mailing list