[Humanist] 26.571 Folger Digital Texts

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Wed Dec 12 08:17:10 CET 2012


                 Humanist Discussion Group, Vol. 26, No. 571.
            Department of Digital Humanities, King's College London
                              www.dhhumanist.org/
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    Patrick Durusau <patrick at durusau.net>                     (55)
        Subject: Re:  26.565 Folger Digital Texts

  [2]   From:    Desmond Schmidt <desmond.allan.schmidt at gmail.com>        (134)
        Subject: Re:  26.565 Folger Digital Texts


--[1]------------------------------------------------------------------------
        Date: Tue, 11 Dec 2012 07:49:41 -0500
        From: Patrick Durusau <patrick at durusau.net>
        Subject: Re:  26.565 Folger Digital Texts
        In-Reply-To: <20121211064601.A3B6E30A9 at digitalhumanities.org>

Wendell,

On 12/11/2012 01:46 AM, Humanist Discussion Group wrote:
<snip>
> Dear Desmond, and HUMANIST,
>
> On Sat, Dec 8, 2012 at 11:01 AM, you wrote:
>> I did download some of the texts.They appear to be marked up for
>> linguistic analysis. I don't wish to criticise the Folger texts per
>> se, but they do lead me to reflect in general on what the digital
>> humanities have become. Is our Shakespeare (and everything else)
>> really preserved for future generations in forms like this, or is it
>> not now mostly a collection of angle-brackets? One of the advantages
>> of XML has always been its supposed human readability, but the gradual
>> increase in complexity over the years has now reached a point where
>> the plain text format is self-defeating. When even a single line of a
>> play has to be stitched together by virtually joining individually
>> marked-up words how can we any longer pretend that XML is readable by
>> humans? We might as well use a standard binary format.
> It's a bit startling, but refreshing, to see this question asked. Yet
> I think the answer is not hard to find if we look around us.

Great answer to binary vs. plain-text but I thought another question was 
implied.

As you know, all text displayed by a computer, stored in either binary 
or plain-text formats is a presentation of an underlying machine 
representation.

As an XSLT maven, I expected you to point out that the XML displayed to 
the reader could be as simple or as complex as desired.

At one extreme, modern office word processing software conceals fairly 
complex XML behind a traditional text interface.

At the other extreme are "plain" text editors that given the impression 
the user is seeing "the format" of the text. Not really. The reader is 
always interacting with a representation of the text, based on a machine 
level format.

It is certainly possible to have an XML encoded text displayed with 
traditional critical apparatus and edited as such with changes to the 
underlying XML.

Why humanists continue to struggle with "raw" XML as though it is 
meaningful for the scholarly enterprise as "XML," I cannot say. What is 
important is capturing their analysis of a text.

Their analysis being preserved in XML is important for interchange and 
legacy preservation, neither of which need to be addressed by working 
humanists. Those are issues for tool makers.
Hope you are having a great day!

Patrick

-- 
Patrick Durusau
patrick at durusau.net
Technical Advisory Board, OASIS (TAB)
Former Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau



--[2]------------------------------------------------------------------------
        Date: Wed, 12 Dec 2012 10:06:57 +1000
        From: Desmond Schmidt <desmond.allan.schmidt at gmail.com>
        Subject: Re:  26.565 Folger Digital Texts
        In-Reply-To: <20121211064601.A3B6E30A9 at digitalhumanities.org>

Wendell,

I wasn't necessarily thinking of EXI or FastInfoset, but more generally
the principle of using a standard binary format, perhaps entirely
different to XML, that would encourage interoperability by discouraging
tinkering. As it is, with the text exposed in this way, the first thing
the recipient of such a file does is modify it for his or her purpose. In
principle, having a black box format that was highly interoperable would
make the tinkering redundant, and encourage the development of truly
interoperable tools that worked upon it. At the moment, XML files in the
humanities are proportionally less useful to others the more markup is
embedded in them, because they become a specific representation of
the work of one researcher, which interferes with the work of another.

Desmond Schmidt
eResearch Lab,
University of Queensland
Australia



More information about the Humanist mailing list