[Humanist] 28.404 PostGreSQL and Solr for digital archives

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Fri Oct 17 08:45:31 CEST 2014


                 Humanist Discussion Group, Vol. 28, No. 404.
            Department of Digital Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist at lists.digitalhumanities.org



        Date: Thu, 16 Oct 2014 21:08:43 +1000
        From: Desmond Schmidt <desmond.allan.schmidt at gmail.com>
        Subject: Re:  28.400 PostGreSQL and Solr for digital archives
        In-Reply-To: <20141016064745.23806622A at digitalhumanities.org>


Hi Martin,

I'd like to expand the discussion a bit, but my point of departure is
your remark that: "little if anything of the TEI encoding is actually
available to the user". The technical reason for this is, of course,
that these applications do not intrinsically support XML, although they
can import it. But the underlying reason is that we encoded the XML
through the exercise of human judgement and interpretation. It should
then come as no surprise that some of that information gets lost when it
is read by a machine.

What I would like to suggest as a remedy to this situation is that we
stop trying to share our data on the *basis* of human-determined tags.
Instead we could use HTML and encode the interpretative part as class
attributes or as RDFa or microformats. TEI could become an *abstract*
set of names defining textual properties, without reference to any
specific technology. One way of recording and expressing those
properties could be via HTML. If we did that then everyone's files would
be interoperable because they would already be in the language of the
Web.

Of course, we can convert the XML to HTML whenever we want, but we don't
seek to share it in that form, we seek instead to share the XML, and we
can't, because TEI-XML is not interoperable. And yet, there is nothing
in TEI-XML that can't be expressed in some alternative way in HTML.
Especially since, according to the <a
href="http://jtei.revues.org/372">recent survey by Burghart</a>, 97% of
TEI-encoded texts of manuscripts (and probably a similar proportion of
printed texts) just get converted into HTML anyway. So please explain to
me why we need to use XML, because I really don't see it.

Desmond Schmidt
Queensland University of Technology

On Thu, Oct 16, 2014 at 4:47 PM, Humanist Discussion Group <
willard.mccarty at mccarty.org.uk> wrote:

>                  Humanist Discussion Group, Vol. 28, No. 400.
>             Department of Digital Humanities, King's College London
>                        www.digitalhumanities.org/humanist
>                 Submit to: humanist at lists.digitalhumanities.org
>
>   [1]   From:    Martin Mueller <martinmueller at northwestern.edu>
>  (57)
>         Subject: Re:  28.394 PostGreSQL and Solr for digital archives
>
>   [2]   From:    Ed Summers <ehs at pobox.com>
>   (16)
>         Subject: Re:  28.394 PostGreSQL and Solr for digital archives
>
>
>
> --[1]------------------------------------------------------------------------
>         Date: Wed, 15 Oct 2014 11:44:16 +0000
>         From: Martin Mueller <martinmueller at northwestern.edu>
>         Subject: Re:  28.394 PostGreSQL and Solr for digital archives
>         In-Reply-To: <20141015053649.8FA7A6083 at digitalhumanities.org>
>
>
> Desmond asks a pointed question that has also been on my mind. It is one
> thing to store data in XML. It is another to mediate the query potential
> of the XML in such a manner that users can get at it. I call this
> "decoding the encoded." In the TEI world I'm familiar with quite a few
> projects with a very "lossy" interface: little if anything of the TEI
> encoding is actually available to the user. As I understand it, Solr can
> get you some of the XML encoding with indexing that associates words with
> some of the information kept in Xpaths. But all of them?? So I'd be
> interested in the trade-offs involved in transforming XML into SQL in the
> particular projects Ashley and Ed write about. What gets lost? And who
> gets to decide whether it matters?
>
> Martin Mueller
> Professor emeritus of English and Classics
> Northwestern University




More information about the Humanist mailing list