[Humanist] 27.777 characters not in UniCode

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Sun Feb 9 08:07:30 CET 2014


                 Humanist Discussion Group, Vol. 27, No. 777.
            Department of Digital Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist at lists.digitalhumanities.org



        Date: Sat, 08 Feb 2014 10:00:42 +0100
        From: maurizio lana <maurizio.lana at gmail.com>
        Subject: Re:  27.760 characters not in UniCode?


at digilibLT - digital library of late latin texts we deal also with 
scientific texts. many characters , mainly but not only units of measure 
of those texts are missing in Unicode.
after a rather quick recognition with the help of david paniagua 
(universidad de salamanca), simona musso and valentina rinaldi (both of 
università  del piemonte orientale) who work for the library digilibLT, 
we can list these groups of characters:

  * roman numerals with multiplier mark
  * greek numerals with multiplier mark: see a list with images of
    missing characters at
    https://drive.google.com/file/d/0B1SZjoqdPETSaTlnd0ROOHJiUXM/edit?usp=sharing
  * units of measure: for a list, see the PDF doc at
    https://drive.google.com/file/d/0B1SZjoqdPETSX2h6MEVtN1ZVMWM/edit?usp=sharing
    where many characters are listed which don't have a Unicode code
    which represents them (see all the characters which are described
    with"null" or with more than 2 Unicode codes); other characters in
    the document show a Unicode code and a glyph but that couple really
    refers to another 'entity' which happens to have the same glyph: it
    is the case for example of sescuncia which happens to have the same
    glyph of the british currency "pound" so when your code has
    sescuncia you put in the digital 'rendering' of the text the glyph
    of the british pound. this should be avoided, but to avoid it you
    need a specific character in Unicode for sescuncia, even if its
    glyph is identical to an already existing one
  * ligatures: for a list of ligatures for units of measure at
    https://drive.google.com/file/d/0B1SZjoqdPETSSzhjbXM5QUFrekk/edit?usp=sharing.
    they can obviously be replaced by their disconnected elements, but
    if we want to produce a diplomatic edition it is not the same to
    reproduce, and offer to the reader, the ligature which because of
    its peculiar stroke could have lead to a certain error, or to read
    the single elements whose strokes cannot be mis-read or
    misinterpreted. so probably we need also specific characters for
    ligatures.

best
maurizio

PS: desmond, why are you doing this catalogue of missing Unicode 
characters? can we hope in an initiative towards Unicode consortium to 
enrich the definitions of the encoding?
:-))

-- 
The knowledge gap between rich and poor is widening.
I. H. Witten, D. Bainbridge, D. M. Nichols,
How to build a digital library, p. 26
-------
il corso di informatica umanistica: http://www.youtube.com/watch?v=85JsyJw2zuw
la biblioteca digitale del latino tardo: http://www.digiliblt.unipmn.it/
a day in the life of DH2013: http://dayofdh2013.matrix.msu.edu/digiliblt/
che cosa sono le digital humanities: http://www.youtube.com/watch?v=4JqLst_VKCA
-------
Maurizio Lana - ricercatore
Università del Piemonte Orientale, Dipartimento di Studi Umanistici
via Manzoni 8, 13100 Vercelli - tel. +39 347 7370925
-------
il corso di informatica umanistica: http://www.youtube.com/watch?v=85JsyJw2zuw
la biblioteca digitale del latino tardo: http://www.digiliblt.unipmn.it/
a day in the life of DH2013: http://dayofdh2013.matrix.msu.edu/digiliblt/
che cosa sono le digital humanities: http://www.youtube.com/watch?v=4JqLst_VKCA
-------
Maurizio Lana - ricercatore
Università del Piemonte Orientale, Dipartimento di Studi Umanistici
via Manzoni 8, 13100 Vercelli - tel. +39 347 7370925





More information about the Humanist mailing list