[Humanist] 23.789 inadequacies of markup

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Sat May 1 10:21:21 CEST 2010


                 Humanist Discussion Group, Vol. 23, No. 789.
         Centre for Computing in the Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    Desmond Schmidt <desmond.schmidt at qut.edu.au>              (31)
        Subject: RE: [Humanist] 23.785 inadequacies of markup

  [2]   From:    Willard McCarty <willard.mccarty at mccarty.org.uk>          (40)
        Subject: problem with markup

  [3]   From:    Wendell Piez <wapiez at mulberrytech.com>                    (36)
        Subject: Re: [Humanist] 23.785 inadequacies of markup


--[1]------------------------------------------------------------------------
        Date: Fri, 30 Apr 2010 23:20:48 +1000
        From: Desmond Schmidt <desmond.schmidt at qut.edu.au>
        Subject: RE: [Humanist] 23.785 inadequacies of markup
        In-Reply-To: <20100430074522.1D4C3535CA at woodward.joyent.us>

Hi all,

I am glad that my paper has already been fairly widely read. I know there a couple of others who have read it that haven't responded yet. 

As for Toma's comments I found them more amusing than critical. On your point that you see no advantage in "plain text versions" for things like historical dictionaries, I never said that I favoured plain text versions. I suggested that, for the present, versions would be represented using light XML markup, and that I would seek to eventually replace the remaining markup with standoff annotation (which would leave the text bare - that may be where you thought I meant plain text). That would serve the same purpose as embedded markup, but would not be prone to overlap problems. Since the variant versions would be more accurately represented - compare the two attempts to encode the Sibylline Gospel text using markup versus the MVD graph, which detects much more detail than can be represented via manual markup, plus the elimination of the overlap problem, and I think you do in fact have a more accurate representation of texts using this approach.

Martin Mueller has also asked a couple of questions that I would like to answer. 

On your first point I don't know of any original analog texts that are perfect OHCOs. But I do know a lot that aren't OHCO at all. The reason I got started on this model of text was some years ago when, like you, I believed in the OHCO model. Then a friend sent me the draft of a modern Italian poem that was structurally spaghetti (forgive the pun). I wish I could show it to you but for copyright reasons I can't. Part of it will be edited and appear in the next issue of DHQ. But it couldn't be encoded in XML at all. I tried very hard to do that before I admitted defeat. And yet we recorded all eleven layers of correction successfully using MVD. 

I think one's view on whether embedded markup is adequate or not depends rather greatly on the difficulty of the texts one has experience with. If one is used to encoding transcriptions of, say, printed novels, then it might seem perfectly adequate, with not many problems. If you work on modern manuscripts, on the other hand, you will come up against its limitations every day. So I don't think it's quite right to say that for you or for me embedded markup is fine, so it's fine for everyone. I'm seeking a general solution that works equally well for everyone and for every text. That is what I mean by an adequate representation, and in that context embedded markup falls well short of what is needed.

On your second point I am aware that my brief summary of MVD is unlikely to explain it adequately. But I couldn't risk a longer description. I did cite the other two papers that explain it fully, even though they are quite technical. But I thought I had made it clear enough at least that the graph of the three versions produced by the program was generated automatically. You don't have to know anything at all about the structure of an MVD if you don't want to, but you have to know quite a bit about how markup works to use it. On that score I feel that my solution, which seeks to automate the edition as far as possible and to free the editor from complex technical work, is in fact the simpler approach.


--[2]------------------------------------------------------------------------
        Date: Fri, 30 Apr 2010 14:33:02 +0100
        From: Willard McCarty <willard.mccarty at mccarty.org.uk>
        Subject: problem with markup
        In-Reply-To: <20100430074522.1D4C3535CA at woodward.joyent.us>


This is a good time, I think, to test my knowledge against those awakened by
the latest conversation on the subject of markup's inadequacies.

Whether OHCO is useful depends, I'd think, on what you have in mind, what
you're interested in doing. For publishing texts, preparing textual
resources to be used in straightforward, more or less conventional ways, I
suppose it's ok as a rough approximation. Like much of computing as it is
usually practiced, markup gets you a distance. We know that good-enough
approaches, e.g. to microphysics, actually get good results, provoke
theoretical work etc. What I object to is the philosophical proposition that
text is OHCO *really*. It isn't, and not only because it isn't singular. I
have no objection to the thesis as long as it's an as-if for purposes of
exploration. But lazy minds allow as-ifs to slip into is. Take behaviourism,
for another example. "Let's pretend that X and see how far we get" is just
fine, but people being how they are, such statements change into others that
aren't.

My problem with markup is that (as far as I know -- always a serious
qualification) it is not systematically alterable for purposes of
re-interpretation once a marked-up text gets over a certain size and
complexity. Making the practical distinction between markup that is in
effect and for most purposes non-interpretative
>  (<paragraph...</paragraph) and markup that is clearly so
> <personification...</personification), the abstract noun
"interpretation" as far as I am concerned is misleading; we should really be
using the gerund/participle "interpreting". Ok, I can produce an edition of
the Metamorphoses (as I did) enriched or muddled by my interpretation of
that poem, but I don't see that this is all that much better than a printed
edition. What I would like is an edition which provides for interpreting in
such a way that my go would inspire or provoke others -- without other
scholars having to do the whole job over again but differently. I would like
my judgements to be dynamically, systematically manipulable. Markup as far
as I know doesn't allow for that, given the software that we know how to
write.

Comments?

Yours,WM
--
Willard McCarty, Professor of Humanities Computing,
King's College London, staff.cch.kcl.ac.uk/~wmccarty/;
Editor, Humanist, www.digitalhumanities.org/humanist;
Interdisciplinary Science Reviews, www.isr-journal.org.




--[3]------------------------------------------------------------------------
        Date: Fri, 30 Apr 2010 16:03:58 -0400
        From: Wendell Piez <wapiez at mulberrytech.com>
        Subject: Re: [Humanist] 23.785 inadequacies of markup
        In-Reply-To: <20100430074522.1D4C3535CA at woodward.joyent.us>

Willard and HUMANIST,

At 03:45 AM 4/30/2010, Desmond wrote:
>Taking out the markup is not as simple a matter as it might seem at 
>first glance.

It isn't? I think that depends on your tools and capabilities.

Certainly, not everyone can take this and run with it:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <xsl:template match="/">
     <xsl:value-of select="/"/>
   </xsl:template>

</xsl:stylesheet>

But the software needed to apply this transformation has been on all 
our computers for about ten years now, and invoking it is a matter 
not of developing a text-processing technology, but of learning what 
particular XSLT processor(s) you have available, how to invoke it on 
your file, and how to save the results. And while you might not 
yourself know how to write the stylesheet given above, there are 
thousands of people who do. (Plus I've just given it to you.)

In any case, as Desmond well knows, stripping markup is a piece of 
cake compared to adding additional markup that doesn't respect the 
design of markup already there. But we are working on that too -- 
without thinking that we have to do away with tagging as a 
methodology for data description, however cumbersome it may sometimes be.

Or am I missing something about the requirement here?

Cheers,
Wendell

=========================================================
Wendell Piez                            mailto:wapiez at mulberrytech.com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
   Mulberry Technologies: A Consultancy Specializing in SGML and XML



More information about the Humanist mailing list