[Humanist] 26.934 volunteers for open-source indexing?

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Wed Apr 3 07:20:21 CEST 2013


                 Humanist Discussion Group, Vol. 26, No. 934.
            Department of Digital Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist at lists.digitalhumanities.org



        Date: Tue, 2 Apr 2013 11:24:11 -0500
        From: Ben Brumfield <benwbrum at gmail.com>
        Subject: Call for Participation: Open Source Indexing


The Challenge

Historic documents often contain handwriting, old fonts, or other text
formats that OCR software can't handle. We need humans--from
volunteers to paid staff--to read the document images and transcribe
what they see into databases which can be searched, analyzed, crawled,
and used by researchers. Until now those efforts have required
organizations either to outsource indexing to external partners or to
cobble together their own off-line or on-site systems.

Our goal is to build a tool that can be used by libraries, archives,
museums, historical sites, genealogy and heritage societies to run
their own indexing projects, under their own control.

The Invitation

We'd like to invite scholars, libraries, archives, and museums;
historical, genealogy, and heritage societies to participate in the
project. Right now we need advice and examples of indexing projects
that real organizations would like to run. This would allow us to work
with an eye on real data outside the UK parish registers and English
census records which have been driving our development up to the
present.

What we need from you

Project definitions including:

 *   Sample image files (around 5 per project in the format you'd use
for access copies),
 *   A maximal spec for the data you'd like to collect,
 *   A minimal set of required fields you need, and
 *   A description of the material and goals of the project.

In addition to example indexing project definitions, we need:

 *   Funding to continue development. Our top priority is building a
tool for our funders' indexing projects at FreeREG and FreeCEN.
Building features outside of the needs common to those projects will
require more funds.
  *  Code contributions and help with design and programming.
  *  Publicity and endorsement to spread the word about Open Source Indexing.

The Tool

We're basing our online indexing tool on Scribe, a tool developed by
the Citizen Science Alliance from their Old Weather project and
deployed by the Bodleian Library for What's the score at the Bodleian.
More recently, Scribe has been customized by New York Public Library
Labs for their Ensemble database of the performing arts.

We're augmenting the Scribe transcription system by adding a database
that allows users to search and view records created by the indexing
tool. We're also adding support for and offline/legacy transcripts
imported via CSV files. Improvements to the data-entry UI and a system
for reporting on indexing activity and managing volunteers will round
out the effort. (See the data flow diagram.)

The entire system will be released under an Apache license. (In fact,
the source code under development already is.)

Read more details at http://opensourceindexing.org/

To get involved or find out more, contact:

Ben Brumfield
benwbrum at gmail.com
http://manuscripttranscription.blogspot.com/





More information about the Humanist mailing list