[Humanist] 26.577 Folger Digital Texts --> XML & scholarship

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Thu Dec 13 09:36:51 CET 2012


                 Humanist Discussion Group, Vol. 26, No. 577.
            Department of Digital Humanities, King's College London
                              www.dhhumanist.org/
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    "Pierazzo, Elena" <elena.pierazzo at kcl.ac.uk>              (21)
        Subject: XML and scholarship (was: Folger Digital Texts)

  [2]   From:    Wendell Piez <wapiez at wendellpiez.com>                    (180)
        Subject: Re: [Humanist] 26.571 Folger Digital Texts


--[1]------------------------------------------------------------------------
        Date: Wed, 12 Dec 2012 11:17:48 +0000
        From: "Pierazzo, Elena" <elena.pierazzo at kcl.ac.uk>
        Subject: XML and scholarship (was: Folger Digital Texts)
        In-Reply-To: <20121212071710.9557A311F at digitalhumanities.org>


Dear All,

I have been reading this thread with increasing irritation as I think it leaves out some crucial points and it shows quite a few misconceptions. 

It seems that we are increasingly debating whether we like or not XML and whether we prefer plain texts. I think this is not really the point. Not many people actually like XML, and I'm one of them. I confess I do not feel any pang of love when I see an angle bracket. However, I think XML is a very useful tool as it allows me and others to achieve my scholarly goals better than any other tool, and the role of XML for scholarship, and in particular textual scholarship, is the part I think is being left out of this discussion.

I was trained as a textual scholar in a very traditional setting, where not even the shade of an angle bracket was in sight. During that time I was growing more and more uncomfortable with the normal practice of silently intervening in the text "normalising" all sorts of features of our heritage texts. XML allowed me and many others like me to embed in the text the documentation of our editorial practice at a level of granularity that no other system was -- and is -- able to do. Furthermore the use of XML according to the TEI Guidelines allowed me and many others to debate our scholarly practice and share our successes and difficulties with a large and growing international community. I have become a much better scholar thanks to the use of XML and the TEI. So, when the the Folger Library made available their XML text they acted following scholarly best practice: to expose their editorial work in a way that other scholars can appreciate and evaluate their editorial work. Plain text has the big disadvantage of hiding under a smooth surface all sorts of editorial intervention, so it is actually false that the plain text does not contain an interpretative level: it does, and in a way that is not recoverable, it does in a non-scholarly way. Unless we are talking about very recent texts, spelling, punctuation, orthographic habits, hyphenation, and capitalisation are all silently introduced by editors. For a Renaissance play we are talking about around 3,000 silent editorial interventions as discovered myself when editing a work of an Italian playwright a few years ago [1]. I think that for Shakespeare we are talking about the same order of magnitude. And I'm not even starting on emendations.

It is not true that we have adopted XML because there are a lot of tools and it is an easy solution: we were using SGML when no tools were available apart from the one we were developing ourselves. We were using it because it met the needs of our scholarly practice.

Again, if someone does not like to take advantage of the rich XML markup, it is actually quite easy to write a script to strip the markup out; in the case of the Folger Texts they have used TEI, which is a largely known standard which should make it easier to know how to delete the mark up. I think reading the TEI Guidelines and therefore making sense of the markup is a small price to pay for having scholarly edited texts that follow good scholarly practice.

I think I can speak for a large part of the community that we will be ready to change technology the moment we are given the opportunity to do our editorial work in a better and scholarly way. We know very well the severe limits of XML but we shall not forget its strengths.

Yours
Elena

[1] I presented these figures at DH2006 in Paris: 'Just different layers? Stylesheets and digital edition methodology'.

--
Dr Elena Pierazzo
Lecturer in Digital Humanities
Department in Digital Humanities
King's College London
26-29 Drury Lane
London WC2B 5RL

Phone: 0207-848-1949
Fax: 0207-848-2980
elena.pierazzo at kcl.ac.uk<mailto:elena.pierazzo at kcl.ac.uk>
www.kcl.ac.uk/ddh



--[2]------------------------------------------------------------------------
        Date: Wed, 12 Dec 2012 10:26:12 -0500
        From: Wendell Piez <wapiez at wendellpiez.com>
        Subject: Re: [Humanist] 26.571 Folger Digital Texts
        In-Reply-To: <20121212071710.9557A311F at digitalhumanities.org>

Dear Patrick, Desmond and HUMANIST,

You both make excellent points; forgive me for doing my best to
encapsulate a response to both in one.

I think the differences between our perspectives here amounts to one
of emphasis.

I for one believe there will never be a black box format, even one
"highly interoperable" such as Desmond stipulates, that will make
tinkering unnecessary, uninteresting or "redundant". To me, tinkering
or the capability of tinkering -- which amounts to subjecting the
underlying critical and interpretive work to critical and interpretive
judgement -- is the essence of it.

It's true this work is technical, and as Patrick suggests will be the
work of tool builders not "end users". But I (with perhaps too much
idealism) maintain that this distinction is not absolute. (Here we
cross threads with Willard's question about "co-evolution".) We are
all tool-builders in some ways and end users in others. More to the
point, a well-built technology enables us to be tinkerers when we
choose to be -- our freedom is not *unnecessarily* constrained by the
particular affordances of the interface; rather those constraints are
enabling. In other words, I suspect that as soon as Desmond's
hypothetical highly interoperable binary sees the light of day, it
will be reverse engineered into something more easily hackable -- most
likely a plain text serialization. One might take PDF and HTML as
examples of the principle here. Despite being (sort of) "standard" and
externally specified, PDF is much harder to build tools for; to my
mind it is no accident that its application remains a functional silo
-- emulating print, with scarce use even of its capacity for hypertext
-- while HTML is getting into everything, despite its poor native
semantics (or in some ways because of them, insofar as its tags are
reduced to hangars for an explicit or implicit ontology represented in
'class' attribute assignments).

Moreover, I consider this work of tinkering to be essentially
humanistic in nature, or ought to be -- at least in the same way as
the work of a doctor, lawyer, engineer or architect should be
"humanistic", and additionally (when it comes, for example, to working
with the encoding of text of literary or historical interest)
humanistic in a narrower sense. The specialist in text encoding must
work shoulder to shoulder with the scholar, or the work of both will
be limited and impoverished for no good reason other than (I suppose)
economics.

I acknowledge that none of this directly contradicts your arguments.
In particular, Desmond says something really important when he remarks
how "at the moment, XML files in the humanities are proportionally
less useful to others the more markup is embedded in them, because
they become a specific representation of the work of one researcher,
which interferes with the work of another". This is indeed a problem
-- engendered by XML's success at doing what it does -- ameliorated
only by skills in processing XML. (But hey, just let me know if you
want help cracking that nut.) The problem is not, however, that the
encoding represents specifically one perspective, but rather that its
representation almost inevitably hinders others. For now -- Desmond
also knows there are alternatives on the horizon.

Best regards,
Wendell

On Wed, Dec 12, 2012 at 2:17 AM, Humanist Discussion Group
<willard.mccarty at mccarty.org.uk> wrote:
>
>                  Humanist Discussion Group, Vol. 26, No. 571.
>             Department of Digital Humanities, King's College London
>                               www.dhhumanist.org/
>                 Submit to: humanist at lists.digitalhumanities.org
>
>   [1]   From:    Patrick Durusau <patrick at durusau.net>                     (55)
>         Subject: Re:  26.565 Folger Digital Texts
>
>   [2]   From:    Desmond Schmidt <desmond.allan.schmidt at gmail.com>        (134)
>         Subject: Re:  26.565 Folger Digital Texts
>
>
> --[1]------------------------------------------------------------------------
>         Date: Tue, 11 Dec 2012 07:49:41 -0500
>         From: Patrick Durusau <patrick at durusau.net>
>         Subject: Re:  26.565 Folger Digital Texts
>         In-Reply-To: <20121211064601.A3B6E30A9 at digitalhumanities.org>
>
> Wendell,
>
> On 12/11/2012 01:46 AM, Humanist Discussion Group wrote:
> <snip>
>> Dear Desmond, and HUMANIST,
>>
>> On Sat, Dec 8, 2012 at 11:01 AM, you wrote:
>>> I did download some of the texts.They appear to be marked up for
>>> linguistic analysis. I don't wish to criticise the Folger texts per
>>> se, but they do lead me to reflect in general on what the digital
>>> humanities have become. Is our Shakespeare (and everything else)
>>> really preserved for future generations in forms like this, or is it
>>> not now mostly a collection of angle-brackets? One of the advantages
>>> of XML has always been its supposed human readability, but the gradual
>>> increase in complexity over the years has now reached a point where
>>> the plain text format is self-defeating. When even a single line of a
>>> play has to be stitched together by virtually joining individually
>>> marked-up words how can we any longer pretend that XML is readable by
>>> humans? We might as well use a standard binary format.
>> It's a bit startling, but refreshing, to see this question asked. Yet
>> I think the answer is not hard to find if we look around us.
>
> Great answer to binary vs. plain-text but I thought another question was
> implied.
>
> As you know, all text displayed by a computer, stored in either binary
> or plain-text formats is a presentation of an underlying machine
> representation.
>
> As an XSLT maven, I expected you to point out that the XML displayed to
> the reader could be as simple or as complex as desired.
>
> At one extreme, modern office word processing software conceals fairly
> complex XML behind a traditional text interface.
>
> At the other extreme are "plain" text editors that given the impression
> the user is seeing "the format" of the text. Not really. The reader is
> always interacting with a representation of the text, based on a machine
> level format.
>
> It is certainly possible to have an XML encoded text displayed with
> traditional critical apparatus and edited as such with changes to the
> underlying XML.
>
> Why humanists continue to struggle with "raw" XML as though it is
> meaningful for the scholarly enterprise as "XML," I cannot say. What is
> important is capturing their analysis of a text.
>
> Their analysis being preserved in XML is important for interchange and
> legacy preservation, neither of which need to be addressed by working
> humanists. Those are issues for tool makers.
> Hope you are having a great day!
>
> Patrick
>
> --
> Patrick Durusau
> patrick at durusau.net
> Technical Advisory Board, OASIS (TAB)
> Former Chair, V1 - US TAG to JTC 1/SC 34
> Convener, JTC 1/SC 34/WG 3 (Topic Maps)
> Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
> Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
>
> Another Word For It (blog): http://tm.durusau.net
> Homepage: http://www.durusau.net
> Twitter: patrickDurusau
>
>
>
> --[2]------------------------------------------------------------------------
>         Date: Wed, 12 Dec 2012 10:06:57 +1000
>         From: Desmond Schmidt <desmond.allan.schmidt at gmail.com>
>         Subject: Re:  26.565 Folger Digital Texts
>         In-Reply-To: <20121211064601.A3B6E30A9 at digitalhumanities.org>
>
> Wendell,
>
> I wasn't necessarily thinking of EXI or FastInfoset, but more generally
> the principle of using a standard binary format, perhaps entirely
> different to XML, that would encourage interoperability by discouraging
> tinkering. As it is, with the text exposed in this way, the first thing
> the recipient of such a file does is modify it for his or her purpose. In
> principle, having a black box format that was highly interoperable would
> make the tinkering redundant, and encourage the development of truly
> interoperable tools that worked upon it. At the moment, XML files in the
> humanities are proportionally less useful to others the more markup is
> embedded in them, because they become a specific representation of
> the work of one researcher, which interferes with the work of another.
>
> Desmond Schmidt
> eResearch Lab,
> University of Queensland
> Australia
>

-- 
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^





More information about the Humanist mailing list