[Humanist] 26.630 XML & what kind of scholarship

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Sat Dec 29 11:08:28 CET 2012


                 Humanist Discussion Group, Vol. 26, No. 630.
            Department of Digital Humanities, King's College London
                              www.dhhumanist.org/
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    James Rovira <jamesrovira at gmail.com>                      (33)
        Subject: Re:  26.627 XML & what kind of scholarship

  [2]   From:    Wendell Piez <wapiez at wendellpiez.com>                     (55)
        Subject: Re: [Humanist] 26.615 XML & what kind of scholarship

  [3]   From:    "Paolo Monella" <paolo.monella at gmx.net>                   (38)
        Subject: Re: [Humanist] 26.581 XML & scholarship


--[1]------------------------------------------------------------------------
        Date: Fri, 28 Dec 2012 10:49:52 -0500
        From: James Rovira <jamesrovira at gmail.com>
        Subject: Re:  26.627 XML & what kind of scholarship
        In-Reply-To: <20121228085606.52148F99 at digitalhumanities.org>


Desmond:

May I suggest one or two ideas?

Suppose we define the encoding example that you provide as an instance of
"translation" rather than "interpretation"?  In this case, the translation
of a print convention (italic type) into an electronic convention (XML). But
in the case of a translation, we have to ask, for whom is the translating
being done?   Who is the recipient or "hearer" of the translated language?

When you say that "the result is exactly the same," I take it that you mean
the result to be the display of italic script on either a computer screen or
in a printed document.  In that case, our translation (XML) is a language
that we're speaking to a machine in order to get it to reproduce something
that already exists in print (italic type).

In this case, XML is irrelevant except as a way to talk to a machine in
order to get it to talk back to us in the way that we want, and the way that
we want it to talk back to us is based upon print conventions, not machine
conventions.  What we're really writing, then, is print -- we're talking
back and forth to each other in print, not in coding.  We only need coding
to talk to a machine so that we can talk to each other through the machine
in a way that mimics print.

Jim R

> Take the example of a 17th century edition of Shakespeare that
> contains an italic word "really". A 21st century digital humanist
> transcribes those black marks on a piece of paper as
> "<emph>really</emph>" in XML. That's an interpretation. The printer
> didn't write that code, didn't use the digital medium, didn't choose
> to mark it with <emph> instead of <hi rend="italic">, etc. On the
> other hand, a 21st century writer who composes a text in which the
> word "really" is encoded natively in his XML as "<emph>really</emph>"
> did in fact write those markup codes, those digital characters. The
> result is exactly the same, but the status of the two digital texts is
> entirely different.
>




--[2]------------------------------------------------------------------------
        Date: Fri, 28 Dec 2012 12:17:27 -0500
        From: Wendell Piez <wapiez at wendellpiez.com>
        Subject: Re: [Humanist] 26.615 XML & what kind of scholarship
        In-Reply-To: <20121222094031.9999E3A46 at digitalhumanities.org>

Dear Desmond,

While there is much more to be said on this topic, I'll limit myself
here to the focus of our dispute regarding the utility of markup
syntax, with my LMNL experiments as the case (with apologies, again,
to readers who aren't following, and an invitation to use the delete
key).

You make some excellent points about the difference between a formal
language and an application built on that language, which generally
entails some level of semantics that cannot be represented directly in
the formalism (and that therefore rely on constraints enforced by
means other than the language's grammar). In LMNL's case, much work
will have to be done to bridge this gap.

To my mind, however, the principle of being able to serialize into a
plain-text format with some level of human intelligibility (in
principle if not always in practice) is important enough to make such
an effort worth undertaking, at least for the moment, and as much for
its instinsic interest as for any supposedly practical purpose. All I
want to do is try it! While you tell me I might as well not. There's
the essence of our disagreement right there.

In particular, you write:
> I don't think this notation would be usable by digital humanists.

And that may indeed be the case ... depending on who you take "digital
humanists" to be. I think my digital humanist is more willing to get
their hands dirty than yours, is not shy about reinventing methods,
tools and methodologies, and considers a plain text editor to be a
useful instrument, not an imposition. And even if no digital humanist
is like this, I see benefits to developers who may not themselves be
researchers in the humanities, but who build tools, resources, and
interfaces for those who are.

Then too, I'll freely admit that this entire project is more on the
'R' side of R&D than the 'D' side. I find the basic idea of markup
irresistible. I can't help but note that markup is as old as writing:
the two emerge together. Use tags written in a plain-text notation, or
display little colored icons and widgetry on a screen, or inscribe a
printed page with a secret language written in pencil. Of these, which
is the most accessible, intelligible, expressive? It depends who you
are, I guess, and what you're trying to express to whom.

I also see a promising middle ground between a plain-text embedded
markup syntax, and a binary format, with LMNL (the model, not the
syntax) being equally at home from one end to the other. If I only had
the Javascript skills, I'd be building an implementation of LMNL in a
browser, modeling ranges using an HTML/RDFa DOM. Would this be a
binary? (Is HTML, much as its users may loathe tagging?) Presumably
such an application would read and export LMNL in sawtooth notation,
or in an XML-based standoff representation (or both, separately or in
combination) -- if not in some sort of awful melange of XHTML/JSON --
for purposes of interchange. But to the user, it could work just like
the tool you are telling us we need.

Cheers,
Wendell

--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^



--[3]------------------------------------------------------------------------
        Date: Fri, 28 Dec 2012 19:30:27 +0100
        From: "Paolo Monella" <paolo.monella at gmx.net>
        Subject: Re: [Humanist] 26.581 XML & scholarship
        In-Reply-To: <20121214060225.AE38E2D96 at digitalhumanities.org>


Dear All,

I subscribe to Doug's statement (in Humanist 26.581) that "a hierarchical,
syntax-heavy data format like XML" does not seem to be the most obvious
option "for modeling most texts which are, to my mind, more like a stream
than a tree". Of course, also text models that are not tree-shaped can be
represented with XML somehow.

I can expose the case study of my own project, for which the tree structre
of XML can still be an option, yet not the most obvious one.

I am working on a digital scholarly edition based on manuscripts.
Implementing a model conceived by Tito Orlandi, I am modeling the text of
each manuscript as a musical score, i. e. by means of three parallel and
aligned sequences of digital entities. At one layer I have a sequence of
graphemes, at another layer a sequence of alphabetical letters, and at a
third layer a sequence of word-elements (note that the set of graphemes is
different, though mappable to, the set of alphabetical letters). Collation
between different manuscripts will eventually take place at each layer
(word-layer with word-layer etc.).

This is, I believe, an example of digital text modeling that does not look
like a tree. Rather, it looks like a musical score with three streams that
need to be aligned at grapheme-level granularity.

Encoding it with XML/TEI is feasible. Python helps me to mark up the text at
grapheme-level granularity, and then (through Python) to align the three
streams with XML/TEI's linking features: you can see some code in
http://www.unipa.it/paolo.monella/lincei/edition.html

A range-based model would be much more of a talored suit for that text
model, yet XML/TEI can be used too, at least as an output/interchange
format, though very cumbersome files are generated. With XML/TEI, one could
argue, comes interoperability. But I guess that the question now is: with no
easy human-readability nor ability to edit the XML source directly, is it
still worth trying to fit a human's suit (XML/TEI) to a Vogon's body (my
'musical score' text model)?

Best,
Paolo

--
Dr Paolo Monella
Centro Linceo Interdisciplinare
http://www.unipa.it/paolo.monella/lincei






More information about the Humanist mailing list