[Humanist] 26.644 XML &c

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Thu Jan 3 09:21:31 CET 2013

                 Humanist Discussion Group, Vol. 26, No. 644.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    "Holly C. Shulman" <hcs8n at virginia.edu>                   (83)
        Subject: Re:  26.640 XML &c

  [2]   From:    James Rovira <jamesrovira at gmail.com>                      (32)
        Subject: Re:  26.640 XML &c

  [3]   From:    drwender at aol.com                                          (55)
        Subject: Re:  26.640 XML &c

        Date: Wed, 2 Jan 2013 09:19:21 -0500
        From: "Holly C. Shulman" <hcs8n at virginia.edu>
        Subject: Re:  26.640 XML &c
        In-Reply-To: <20130102075801.7948830A9 at digitalhumanities.org>

Dear Desmond,

Agreed with all you have said.  Also why there will never be any kind of
transcription that does not involve interpretation.  But I was interested
in your decision to interpret "..." as emphasis as I tend toward what I
consider a more literal transcription with less editorial intervention.
But that said, I am in agreement with you.


On Wed, Jan 2, 2013 at 2:58 AM, Humanist Discussion Group <
willard.mccarty at mccarty.org.uk> wrote:

>                  Humanist Discussion Group, Vol. 26, No. 640.
>             Department of Digital Humanities, King's College London
>                               www.dhhumanist.org/
>                 Submit to: humanist at lists.digitalhumanities.org
>         Date: Tue, 1 Jan 2013 04:56:59 +1000
>         From: Desmond Schmidt <desmond.allan.schmidt at gmail.com>
>         Subject: Re:  26.637 XML & what kind of scholarship
>         In-Reply-To: <20121231073726.9DE03ED6 at digitalhumanities.org>
> James,
> > Why the word "interpretation" rather than "translation," at least in the
> > simple cases? I can see more complex cases being more interpretive.
> because conversion from one medium to another always involves some
> form of interpretation. Even if you write a program to "translate"
> Microsoft Word to XML you have to decide when you write the program
> which source codes get mapped to which target codes. There must be a
> dozen reasons why something could be in italics. What about foreign
> words or phrases? You'd want to use <foreign>, not <emph> or <hi
> rend="italics">. What about stage directions in a play? You'd want to
> use <stage rend="italics">...  Note that below Holly and I disagree on
> how to encode italics. Differences of interpretation create real and
> non-trivial disputes even in "simple cases".
> Holly,
> > I am wondering why the decision was
> > made to render "really" as emphasis rather than as italics or retain them
> > in quotation marks.
> Why not? That's what the <emph> code is for. (Btw it wasn't meant to
> be in quotation marks). But there are always several ways to encode
> the same textual phenomenon. Imagine that you and I are transcribing
> some printed correspondence. You code italics as <hi rend="italics">
> and I code it as <emph>. Then we push the transcriptions through some
> software, and get inconsistent output. That's why I don't think we can
> ever have a "standard" way to mark up texts, that goes beyond a mere
> vocabulary of pre-defined codes.
> > So I'm simply curious about this one as it is not the decision
> > that I, as an editor, would have made
> Of course everyone makes different decisions about every textual
> phenomenon that they see. Each transcription thus bears the
> fingerprint of the person who made it. As long as we encode things
> this way we can't ever develop general software solutions to editorial
> problems.
> Desmond Schmidt
> eResearch Lab
> University of Queensland

Holly C. Shulman
Editor, Dolley Madison Digital Edition
Founding Director, Documents Compass
Research Professor, Department of History
University of Virginia
hcs8n at virginia.edu

        Date: Wed, 2 Jan 2013 10:44:30 -0500
        From: James Rovira <jamesrovira at gmail.com>
        Subject: Re:  26.640 XML &c
        In-Reply-To: <20130102075801.7948830A9 at digitalhumanities.org>

Desmond --

Thanks much for the response, but if the output is italics in all cases,
how is the encoding interpretation in this case?  What you refer to are
differences of interpretation of the italicized text itself, a fact which
is common to both print and digital media.  Italics, obviously, can be used
for foreign words, emphasis, stage directions, titles, etc., but this
variety of uses pre-existed digital media and is only reflected by it again
now.  Once the decision to italicize has been made, what difference does it
make what code you write to produce italic text?

The context I had in mind was digital reproduction of originally printed
text, though, with the person doing the encoding having to decide how to
encode italic text.

Jim R

        Date: Wed, 2 Jan 2013 21:14:07 -0500 (EST)
        From: drwender at aol.com
        Subject: Re:  26.640 XML &c
        In-Reply-To: <20130102075801.7948830A9 at digitalhumanities.org>

“The digital humanities are growing rapidly in response to a rise in Internet use. What humanists mostly work on, and which forms much of the contents of our growing repositories, are digital surrogates of originally analog artefacts. But is the datamodel upon which many of those surrogates are based - embedded markup- adequate for the task?”
Desmond Schmidt (2012) The role of markup inthe digital humanities. HistoricalSocial Research, 37(3),pp. 125-146. (Citing the abstract.)

Dear Desmond,

which word/s should/could be stressed - via italics? -in the parenthesis above:  _embedded_,  _markup_, or both:  _embeddedmarkup_?
Trying to understand your arguments I take an example line out of Folger Shakespeare's JC.xml, one of the pretty XML files from which this thread was starting. I'm imagining a little stupid relational database, starting with only 2 tables: the first to store all the stuff in the verse lines that rests when I strip away what is talked about this stuff by markup statements, bewaring only the information given in xml-id;  the second table to store those informations about line arrangement that are given in the JOIN entities. Using field separator '@' and line breaks as record separators, the 2 flat files for my tables will show the following 2streams:

for Table-I :

w at 0074730@A
c at 0074740@ 

w at 0074750@very
c at 0074760@ 

w at 0074770@pleasing
c at 0074780@ 

w at 0074790@night
c at 0074800@ 

w at 0074810@to
c at 0074820@ 

w at 0074830@honest
c at 0074840@c
w at 0074850@men
p at 007486@.

for Table-II
ftln at 0460@verse at 0074730@0074860

I havn't studied the numbering conventions explained in Folger's _refsDecl_ but I suppose that all other markup information in their files – if needed for computational tasks in digital humanities - could also be extracted and suitably stored in some further tables. Therefore I would agree with everyone who abhors the redundancy in heavily tagged literary texts like in Folger's digital Shakespeare. But what you, Desmond, say about fact/interpretation (in german terms, following Hans Zeller: “Befud und Deutung”?) and about the 'native' anolog/digital rupture, I _really_ disagree. I'm hoping to support the points already mentioned by James and Holly adding an example taken from the textual tradition which I best know.

In Georg Büchnerr's manuscript for his first play, “Dantons Tod”, we can find some words underlined, f.e. says Danton in replying the statement “Wir und die ehrlichen Leute” (We and the honest men): “Das _und_ dazwischen ist ein langes Wort” (The _and_ between is a long word). The first print editions – first partially in a journal print, then in a book version, both in 1835 and both censored  - are showing a spaced word in quotation marks (Das “u n d” dazwischen...). IMHO the interpretation does not begin with my analysis stating (1) the enclosement in quotation marks as expressing a citation relationship to a word in the speech uttered just before - expressible in XML makup via pointer to the appropriate xml-id - and (2) the spacing as signalling a metalinguistic reference to an  item in the dictionary. The interpretation begins when the typesetter decides to use both quotation marks and spacing, and perhaps that's a misinterpretation because he wasn't observing the systematic use of underlinement in similar cases.

If we search the literary text for further occurrences of the word “Wort” we find in the manuscript the following underlined words:

[a. see above]
b. "das Wort _Strafe_"
c. "das Wort _Blut_"
d. "das Wort _Erbarmen_"

The prints are not consistent in thiss respect:
a. quotation marks + spaced 

b. no typographical signalling

c. (only in the book version)" das Wort: [sic] Blut" (not spaced) 

d. spaced 

If the manuscript were lost as in so many other cases, the so-called facts would be the confusing interpretations by contemporary typesetters. 

They are really honest men who wish to see 'authentic' versions in digital surrogates, but they can be helped best by a Google-like two-faced transmission of JPEGs + plain text OCR. What comes thereafter in the field of scholarly editions is to debate firstly in terms of good or bad editorial practices depending from the goals tended to be reached with an edition. If we need markup to ship-out editorial information to the addressees (mostly other scholars)? I'm continuing to doubt.


A _very_  pleasing night to honest men


More information about the Humanist mailing list