[Humanist] 26.594 XML & scholarship

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Sun Dec 16 11:32:55 CET 2012

                 Humanist Discussion Group, Vol. 26, No. 594.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

        Date: Sat, 15 Dec 2012 06:42:04 -0500
        From: Patrick Durusau <patrick at durusau.net>
        Subject: Re:  26.586 XML & scholarship
        In-Reply-To: <20121215105208.2DF9C3A28 at digitalhumanities.org>


>> Date: Thu, 13 Dec 2012 22:26:00 -0500
>> From: Doug Reside <dougreside at gmail.com>
>> Subject: Re:  26.577 Folger Digital Texts --> XML & scholarship
>>> But then I think about all of the attempts I and others have made to
>>> create "easy to use" XML editors that end up being less functional and
>>> harder to use than a simple text editor.  Anyone with a modicum of web
>>> design experience who has tried to edit HTML in WordPress or Drupal
>>> usually starts hunting for the "edit source" button immediately.  It
>>> feels like there SHOULD be a better kind of data entry tool for
>>> text-encoding than an angle bracket editor, but I'm not yet sure what
>>> it is.
> Doug,
> I'm glad that someone else recognises the difficulty of this problem.
> It seems like it ought to be possible to build a graphical editor for
> TEI-XML, but with 544 or more tags it's impossible to translate all the
> structures that humanists want to record and represent them all
> graphically. Simple textual highlighting works, sure, paragraph
> structures work, but variants, virtual joins, footnotes, links, etc etc?
> Since you have to represent many tags as raw XML what happens if the
> user makes a mistake? You'd have to handle that error right there in
> your online editor, not when the text is sent to the server. You'd have
> to provide context-sensitive editing, hundreds of pages of explanations
> as to what each tag signifies, and explain to the user how to fix each
> mistake. Not a simple task to program, and certainly not a simple task
> to use it.
> The user need to have a simple editor cannot be met by XML.

On the contrary, the error is starting from XML rather than the 
interface for the user. An XML instance is an artifact that records 
choices made by the user.

As far as the complexity of TEI, consider that some of the attributes in 
OOXML have 200+ different contextual meanings, but bear the same 
attribute name. MS Word seems to handle that.

Another error is assuming that the use of overlapping ranges is somehow 
less complex than XML in terms of representation.

That is to say, whatever structure you came to need explanation in XML, 
if you are going to represent it with overlapping ranges, doesn't the 
user need the same explanation?

Ah, but no, they most likely don't because with ranges, the semantics 
that are *explicit* in XML, can be left *implied.* (Not that they must 
be as I am sure Wendell will be quick to point out. Making semantics 
explicit is part of the "hardness" of XML but it is also part of what 
makes it useful. PDF has implied semantics but I would be loath to 
publish a critical edition using it.)

Implied semantics are *lossy* recording of semantics because there can 
be no accumulation of analysis on top of implied semantics nor any 
reliable interchange of the underlying artifact.

> You
> have to think beyond it, and I believe a consensus is now emerging in
> the digital humanities that at least the properties of text (NOT its
> versions) can be practically represented as overlapping ranges. There
> are quite a few projects now exploring this line of research: eComma,
> CATMA, LMNL, our own standoff properties. It's not rocket science. It's
> very simple, and it works. Check out our website austese.net/tests/.
> Everything you see here is done without XML, from the server to the
> visualisations, comparisons, everything. The only thing that handles
> XML are the import tools, of course. So I don't believe that XML is
> actually needed any more to get our work done.

A very impressive demonstration, which Humanist readers should enjoy.

But the question remains, how are the semantics of the structures 

I agree "It's very simple and works." but that isn't my issue.*

My issue is how 10, 20 or 200 years from now I will be able to make 
sense of the encoding and leverage further analysis on top of it. If the 
semantics are implied, ranges or no, I cannot reliably reuse a text.

Hope you are having a great weekend!


* We should be mindful that "simple and works" is a poor basis for 
format/program design. The original presumption of well-formed XML was 
made in deference to programmers who could write an XML parser in a 

It is "simple and works" but fails to account for structures that we can 
attribute to any text.

While I recognize the shortcomings of XML, the loss of explicit 
semantics, by whatever means, is a cure worse than the disease.

Patrick Durusau
patrick at durusau.net
Technical Advisory Board, OASIS (TAB)
Former Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau

More information about the Humanist mailing list