[Humanist] 28.376 creating an online dictionary & an announcement

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Tue Oct 7 07:09:48 CEST 2014


                 Humanist Discussion Group, Vol. 28, No. 376.
            Department of Digital Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    Gregory Crane <gregory.crane at tufts.edu>                   (38)
        Subject: Liddell Scott Jones and Lewis and Short Lexica on-line + new
                lexica

  [2]   From:    Desmond Schmidt <desmond.allan.schmidt at gmail.com>         (61)
        Subject: Re:  28.375 creating an online dictionary?

  [3]   From:    Ben Brumfield <benwbrum at gmail.com>                        (83)
        Subject: Re:  28.375 creating an online dictionary?


--[1]------------------------------------------------------------------------
        Date: Mon, 6 Oct 2014 07:27:58 +0200
        From: Gregory Crane <gregory.crane at tufts.edu>
        Subject: Liddell Scott Jones and Lewis and Short Lexica on-line + new lexica
        In-Reply-To: <20141006051447.2364A65A6 at digitalhumanities.org>

Dear Humanist world,

After sitting on the data for far too long, we finally (thanks to my 
colleague Anna Krohn) pushed out the TEI XML for Liddell Scott Jones and 
Lewis and Short lexica out onto Github: 
https://github.com/PerseusDL/lexica.

Dear Maurizio,

The big issue for print lexica is getting the citations tagged. Big 
dictionaries often do a lot of abbreviating and it can be messy trying 
to expand them.

We got the hierarchical structure from these and other lexica -- that 
should be doable.

We have the Italian Calonghi version of the Georges Latin lexicon in the 
data entry queue for Open Greek and Latin. That will look like LSJ and 
LS when it comes out (though we may not immediately do the citation 
fishing).

Greg

On 10/6/14, 7:14 AM, Humanist Discussion Group wrote:
>                   Humanist Discussion Group, Vol. 28, No. 375.
>              Department of Digital Humanities, King's College London
>                         www.digitalhumanities.org/humanist
>                  Submit to: humanist at lists.digitalhumanities.org
>
>
>
>          Date: Sun, 05 Oct 2014 12:18:20 +0200
>          From: maurizio lana <maurizio.lana at unipmn.it>
>          Subject: creating on online dictionary
>
> dear humanists,
> i own the rights of a latin-italian /italian-latin dictionary. the
> dictionary is on paper.
> would i OCR it, which could be the ways to put it online for free
> consultation and use?
> are there any 'software structures' (nearly) ready to use or should i
> think of building an ad-hoc solution?
> thanks for the help
> maurizio
>



--[2]------------------------------------------------------------------------
        Date: Mon, 6 Oct 2014 20:36:11 +1000
        From: Desmond Schmidt <desmond.allan.schmidt at gmail.com>
        Subject: Re:  28.375 creating an online dictionary?
        In-Reply-To: <20141006051447.2364A65A6 at digitalhumanities.org>


Maurizio,

There seem to be 3 steps:

1. If we assume that the paper copy is in good condition and easy to OCR
you will need an OCR program with both Latin and Italian dictionaries.
Otherwise your error rate will be too high. Abby Finereader springs to mind
but there are probably others. That will get you a word-processed document,
hopefully with not too many errors. You could just turn it into HTML and
search it online. But it wouldn't be very useful like that.

2. Getting it into a dictionary format will be the hardest step. You could
write a script in python, perl etc to convert the files into something like
XDXF format, utilising say the change from italics to roman script, and the
sequence of keywords to build up the structure. Since this step depends on
the way the dictionary is formatted this solution would probably have to be
a custom one.

3. From there you should be able to use some open-source tools to display
the translations online.
Sounds like quite a bit of work, though.

Desmond Schmidt
Institute for Future Environments
QUT


--[3]------------------------------------------------------------------------
        Date: Mon, 6 Oct 2014 09:07:16 -0500
        From: Ben Brumfield <benwbrum at gmail.com>
        Subject: Re:  28.375 creating an online dictionary?
        In-Reply-To: <20141006051447.2364A65A6 at digitalhumanities.org>


Dear Maurizio,

I would be very surprised if you needed to build an ad-hoc solution to
present your dictionary after it is OCRed.

I'd like to recommend very strongly that you investigate options from/with
Wikimedia.  In addition to Wikipedia (which is best known), the Wikimedia
foundation runs two sister projects which should be of interest.

Wiktionary is a multi-lingual dictionary based on the Mediawiki software
platform.  There are separate Wiktionary sites for researcher languages,
each of which provides multi-lingual dictionaries: compare the
French-language Wikipedia entry on "data" at
http://fr.wiktionary.org/wiki/data with the English-language Wikipedia
entry on "data" at http://en.wiktionary.org/wiki/data .  Each
language-specific site is run by its own community, so the Italian-language
Wikimedia/wiktionary community would be the appropriate place to turn.

Wikisource is a different sister project, which provides a wiki-based
software platform for online editions of published works.  If it's
important that your text be arranged as the published text was (rather than
as an online dictionary like Wiktionary organizes information), it would be
an appropriate publishing platform.

In both cases, you have the option of either installing and configuring the
software yourself for a stand-alone digital edition, or partnering with the
Wikimedia community to host your dictionary on one of their sites.  There
are benefits to both approaches.  Partnering would save you from the need
to install and configure the tool, as well as providing you with technical
experts and volunteers. On the other hand, you would have more control over
the brand of the site with a stand-alone installation, and you would not be
required to comply with Wikimedia policy on licensing and copyright.

As it is often hard for outsiders to get started in the Wikimedia
community, the community has launched GLAM-Wiki for doing outreach to
cultural institutions interested in exploring participation.  They can help
answer questions, suggest courses of action, and sometimes provide
assistance in digitization (scanning and OCR ingestion).  My impression is
that the Italian-language GLAM-Wiki is interested in DH, as I sometimes see
proposals to add TEI support to Wikisource proposed by Italians.  I suggest
that you contact them at http://outreach.wikimedia.org/wiki/GLAM/Contact_us
to explore options.

Best of luck!

Ben

Ben W. Brumfield
Independent Developer
http://manuscripttranscription.blogspot.com/
http://fromthepage.com/



More information about the Humanist mailing list