[Humanist] 26.651 XML &c

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Sat Jan 5 08:21:52 CET 2013

                 Humanist Discussion Group, Vol. 26, No. 651.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    James Rovira <jamesrovira at gmail.com>                      (32)
        Subject: Re:  26.648 XML &c

  [2]   From:    Desmond Schmidt <desmond.allan.schmidt at gmail.com>        (114)
        Subject: Re:  26.648 XML &c

  [3]   From:    Willard McCarty <willard.mccarty at mccarty.org.uk>          (19)
        Subject: contingencies of markup

        Date: Fri, 4 Jan 2013 08:42:35 -0500
        From: James Rovira <jamesrovira at gmail.com>
        Subject: Re:  26.648 XML &c
        In-Reply-To: <20130104085537.F4162DFB at digitalhumanities.org>


Thanks very much for your reply.  We should perhaps take for granted that
the very obvious is obvious to everyone.  Yes, interpretation is in the
individual head of the reader and in the collective heads of all readers
and their accepted and known social (print) conventions, which in this case
means that italicized text can fill any one of a number of rhetorical
functions, which one(s) in any given instance being determined by context.

It probably would have helped our discussion, as Wendell said, if we had
distinguished between our attempts to render something a certain way on a
screen or in a print document (as I've been talking about) and our attempts
to create a digitally searchable archive that distinguishes between
different uses of italicized text.  I'd agree that the latter is always
interpretive.  I don't think the former necessarily is.

Much appreciation for Wendell's post.

Jim R

On Fri, Jan 4, 2013 at 3:55 AM, Humanist Discussion Group <
willard.mccarty at mccarty.org.uk> wrote
> James,
> > What you refer to are differences of interpretation of the italicized
> text itself,
> > a fact which is common to both print and digital media
> The interpretation is not a "fact" of the printed text. That just has
> has black marks on a page, whereas the digital text
> <emph>really</emph> has an explicit interpretation encoded in digital
> bytes, which states: "the word 'really' is emphatic". Where is that
> information in the printed text, and not just in your head as you read
> it?

        Date: Sat, 5 Jan 2013 07:52:03 +1000
        From: Desmond Schmidt <desmond.allan.schmidt at gmail.com>
        Subject: Re:  26.648 XML &c
        In-Reply-To: <20130104085537.F4162DFB at digitalhumanities.org>

Dear Humanist readers,

Since we have found a mountain of interpretation in mere italics, it
follows that the interpretation in the other markup we add to a text
must be huge. This begs what is to me the crucial question: why should
we embed this information into the plain text and so confuse it with
our own views? Textual interoperability is now an imperative given the
development of national and international repositories of digital
texts. In purely practical terms what you store in such repositories
MUST be reusable by others.

I'm not saying that there is no interpretation in plain text. But
there is not much if the original text was clear. What people don't
dispute is not worth classifying as "interpretation".

I'm not saying that text should be plain because we need to encode
lots of information about it too.

I'm not talking about any specific technology, just the PRINCIPLE that
we should keep our interpretations separate from the texts they
comment on.

Objection: Since text contains some interpretation we can't separate
markup from text. Even punctuation is a kind of markup etc.

This seems to me to be an argument post factum. We have already
encoded our texts in XML and now we want to justify it. We are trying
to persuade our non-XML humanist colleagues that XML is just like
something they already use, like punctuation. We cling limpet-like to
the assertion that there is no line to be drawn in the sand where
markup ends.

But my argument is purely practical. There IS a useful dividing line
that maximises text reusability. It is to remove markup from the text
and store it separately. That gives us the ability to: recombine
interpretations with the text at any point, for whatever purpose, and
to separate our interpretations into various layers: layout structure,
morphological analysis, annotations etc. Crucially, it gives us the
flexibility to merge different layers of interpretation with the text
as needed. As it stands, with embedded markup codes we have to put
everything into one encoding at a time. Then other people can't reuse
the text without first removing our markup (which is a lot harder than
it sounds). What we have now are texts designed for one purpose. But
we need multi-purpose texts. If the technology for keeping markup
separate from the text simply worked in the way I have described
without your being aware of it, wouldn't that be much better?


>In the case of Shakespeare, you are quite comfortable reviewing and
>cataloging the physical features as they exist and leaving it at that. Why,
>confronted with an electronic text, make recourse to questions of authorial

The difference is quite simply because I can see the interpretations
as markup encoded in the text. Those same interpretations (such as
"this is emphatic") are not present in the printed text. But I concede
your point that any printed text already contains interpretations made
by the people who typeset it.

>Do your
>readers really understand "<emph>" means "emphasis" and not just "the the
>shortcut I use because I know that my XML will eventually be converted to
>HTML and most web browsers render '<emph>' as italic"?

I agree that Shakespeare or his printer may have intended to encode
"emphasis" as italics. But I don't know that for sure. All I know is
that there is italics on the page. When I write <emph> in a digital
text that code is for emphasis. If I make a mistake or I am sloppy
that doesn't change the fact that I wrote "this is emphatic".
Shakespeare's printer didn't write that. He just wrote "this is

>An author producing a
>manuscript in Microsoft word never sees the lines of markup insertion
>triggered but the "i" button on the toolbar.

There's actually a predefined Word character style called "Emphasis",
just as in XML. The intended interpretation might be blurred by
carelessness on the part of the encoder, but once the fact of the
encoding is produced I think it is reasonable to assume that if
someone deliberately chose a format called "Emphasis" that they meant
it and not any old italics.

>Finally, how is "analog" truly a useful category of text?

It's a different medium, and even if it is produced from an electronic
text the constraints of print still apply. But I'd agree that we are
talking about several different media here: typesetting codes for
imagesetters, XML, Word, Web, etc. and not just "digital".

> Distinguishing between the subject and
> object of interpretation with electronic texts is a battle that is lost
> before it is even begun. The markup itself is already encoded in utf8 or
> another scheme.

You almost seem as if you want to say here that other people's
interpretative markup added to Shakespeare might as well be treated as
if it was Shakespeare. I don't think pointing out cases where it is
hard to separate text from markup means that we should just give up
the text/markup distinction. It's clear enough in XML: technically,
markup is the stuff in angle-brackets plus the formatting white space
between markup codes. The rest is "content". Yes, content may contain
markup in some esoteric form, but not XML.

> Likewise, physical texts have been physically marked up for millennia.

>Physical texts were never "facts." They are complex, messy containers for
>bibliographic codes that are subject to interpretation.

As I already stated in my response to Herbert, the word "fact" can
mean something that exists, at least in English. It doesn't have to be
true. It can also be a frozen repository of old interpretations. A
physical text you can touch is quite definitely a fact. Even a digital
text in your computer is a fact because you can verify its existence
in various ways.


>(1) how appropriate are those (embedded or stabd-off) descriptions of illocutionary acts?
>or (2) how reasonable they are taken in their own right as perlocutionary acts in a scholr-to-scholar >communication?
>or (3)  - for me the crucial quwstion - how to judge the costs of blind tagging (without knowledge of the processes >later on handling all the pretty XML files)?

On 2) I think that this is exactly the point: that these markup codes
are for scholar to scholar communication. They are commentaries on the
text just like the scholia in ancient manuscripts, the annotated print
editions and the critical edition apparatus etc.They are not thing
being commented on.
On 3) If I understand this point rightly you mean the tendency in
humanities computing to encode now and worry about how it will be
processed later. It should not be forgotten that we encode texts
digitally in order to process them in a computer. Taking account of
exactly what the computer can do with our codes is an essential task
that is often ignored.

Desmond Schmidt
eResearch Lab
University of Queensland

        Date: Fri, 04 Jan 2013 22:16:20 +0000
        From: Willard McCarty <willard.mccarty at mccarty.org.uk>
        Subject: contingencies of markup
        In-Reply-To: <20130104085537.F4162DFB at digitalhumanities.org>

Allow me once again to suggest that in discussions of markup (or any 
other tool/method) we identify the disciplinary context(s). Are we not 
*always* assuming such a context even when we talk as if the tool 
or method applied indifferently to all? Textual editor of Shakespeare, 
literary critic of Shakespeare, historian of Shakespeare's time or 
language or whatever -- very different sets of interests, approaches, 
criteria etc, I would think.

What is being encoded & to what end? I cannot see that there can be any 
neutral or universal standpoint. Assuming one can say what a "formal" 
method is, how would one think purely in such terms?

Willard McCarty, FRAI / Professor of Humanities Computing & Director of
the Doctoral Programme, Department of Digital Humanities, King's College
London; Professor, School of Humanities and Communication Arts,
University of Western Sydney; Editor, Interdisciplinary Science Reviews
(www.isr-journal.org); Editor, Humanist
(www.digitalhumanities.org/humanist/); www.mccarty.org.uk/

More information about the Humanist mailing list