[Humanist] 28.449 HTML vs XML for TEI

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Wed Oct 29 09:06:37 CET 2014

                 Humanist Discussion Group, Vol. 28, No. 449.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

        Date: Tue, 28 Oct 2014 14:12:30 -0400
        From: Hugh Cayless <philomousos at gmail.com>
        Subject: Re:  28.442 HTML vs XML for TEI
        In-Reply-To: <20141027074945.C133C68D4 at digitalhumanities.org>

> > interoperability might be a goal of a specific customization of TEI, but
> > it’s not something I’d be interested in imposing on TEI as a whole.
> People
> > want to do different things with different kinds of text.

> I'm sorry I don't follow the logic here.  A specific customisation of TEI
> is by definition not interoperable. People definitely want to collaborate,
> not so much to share texts but to build tools that work across texts.
> That's the real advantage of interoperability.

Not true. EpiDoc, for example, is a customization of TEI designed for the
encoding of ancient texts. It's widely used in epigraphy and papyrology. I
could see people wanting those kinds of texts to interoperate. What I don't
see easily combining is my contract for the lease of oxen in 10 BCE with a
genetic edition of Faust. The editors of those documents just aren't going
to be concerned with the same kinds of encoding. And that's ok.

> > I’ve just replaced my supplied tag with something like <span typeof="
> > http://www.tei-c.org/ns/1.0#supplied" data-reason="lost">this</span>
> (and
> > incidentally, it could not be so simple if we’re really using RDFa)
> What's wrong with <span property="tei:supplied-lost">this</span>?

Here's where RDF is a problem. What's the subject of that triple? I assume
it's the document in your formulation, but that seems semantically not
quite right to me. I suppose you can make anything work by convention, but
encoding things in RDFa implies you're going to be able to extract triples
from it, and I think that's a problem. If I were going to do this for real,
I think I'd use data attributes.

> Consider this: TEI as an abstract specification, not bound to any
> technology, with all the cruft cleaned out, a specification agreed to by
> the full *community* of digital humanists, not just the Americans, British
> and a few Europeans, or paid-up members. It would be a specification that
> could be realised in three forms: a) XML, b) HTML+microformats/RDFa or
> whatever c) plain text+external markup and any other forms or technologies
> that come into being in future. That would be useful to a much wider range
> of people than at present. And no, I don't think it is a pipe dream. It is
> realisable if you try.

It may surprise you to know I agree entirely that TEI needs to move towards
being an abstract specification. We've discussed this very thing at TEI
Council meetings in fact. What you refer to as "cruft" though, are the
parts of the TEI infrastructure that actually work, and that do the work of
publishing the Guidelines, generating schemas, etc.. So we can't just throw
them out. There will have to be a process of transition. I'd quite like to
see JSON and markdown-ish text on that list too.

It sounds to me like we agree on a lot of things, even if not on the
feasibility of doing those things quickly. I hope your trip to London is a
safe and productive one!

All the best,

More information about the Humanist mailing list