[Humanist] 26.612 XML & what kind of scholarship

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Fri Dec 21 10:08:29 CET 2012


                 Humanist Discussion Group, Vol. 26, No. 612.
            Department of Digital Humanities, King's College London
                              www.dhhumanist.org/
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    Patrick Durusau <patrick at durusau.net>                    (100)
        Subject: Re:  26.605 XML, TEI and what kind of scholarship?

  [2]   From:    Patrick Durusau <patrick at durusau.net>                    (115)
        Subject: Re: [Humanist] 26.609 XML & what kind of scholarship

  [3]   From:    Wendell Piez <wapiez at wendellpiez.com>                     (86)
        Subject: Re: [Humanist] 26.609 XML & what kind of scholarship

  [4]   From:    Desmond Schmidt <desmond.allan.schmidt at gmail.com>         (43)
        Subject: Re:  26.609 XML & what kind of scholarship

  [5]   From:    Patrick Durusau <patrick at durusau.net>                     (29)
        Subject: The Power of Notation

  [6]   From:    drwender at aol.com                                          (30)
        Subject: Re:  26.605 XML, TEI and what kind of scholarship?


--[1]------------------------------------------------------------------------
        Date: Thu, 20 Dec 2012 05:10:14 -0500
        From: Patrick Durusau <patrick at durusau.net>
        Subject: Re:  26.605 XML, TEI and what kind of scholarship?
        In-Reply-To: <20121219074123.26E34DC1 at digitalhumanities.org>


Willard,

On 12/19/2012 02:41 AM, Humanist Discussion Group wrote:
<snip>
> For the literary scholar, however, interpretation is a different matter,
> requiring a very different disciplinary style and making very different
> demands on the technologies we devise to assist it. My 10 or so years
> devoted to markup (pre-TEI) taught me that it is not in principle
> well-suited to the literary critic's interpretative practices. Jerome
> McGann has made this point forcibly numerous times.

Taking "Marking Texts of Many Dimensions" 
(http://www2.iath.virginia.edu/jjm2f/blackwell.htm) and "Visible and 
Invisible Books: Hermetic Images in N-Dimensional Space" 
(http://www2.iath.virginia.edu/jjm2f/old/nlh2000web.html) as 
representative of McGann's position generally, I don't see support for 
the proposition that markup:

"...is not in principle well-suited to the literary critic's 
interpretative practices."

Granting use of markup can be pedestrian and unimaginative, not to 
mention the foolish well-formedness constraint of XML, but I see nothing 
inconsistent with the use of markup and McGann's conclusion in "Marking 
Texts of Many Dimensions:"

> This model of text-processing is open-ended, discontinuous, and 
> non-hierarchical.It takes place in a fieldspace that is exposed when 
> it is mapped by a process of "reading".A digital processing program is 
> to be imagined and built that allows one to mark and store these maps 
> of the textual fields and then to study the ways they develop and 
> unfold and how they compare with other textual mappings and 
> transactions.Constructing textualities as field spaces of these kinds 
> short-circuits a number of critical predilections that inhibit our 
> received, common sense wisdom about our textual condition.First of 
> all, it escapes crippling interpretive dichotomies like text and 
> reader, or textual "subjectivity" and "objectivity".Reader -response 
> criticism , so-called, intervened in that space of problems but only 
> succeeded in reifying even further the primary distinctions.In this 
> view of the matter, however, one sees that the distinctions are purely 
> heuristic.The "text" we "read" is, in this view, an autopoietic event 
> with which we interact and to which we make our own 
> contributions.Every textual event is an emergence imbedded in and 
> comprising a set of complex histories, some of which we each partially 
> realize when we participate in those textual histories.Interestingly, 
> these histories, in this view, have to be grasped as fields of action 
> rather than as linear unfoldings.The fields are topological, with 
> various emergent and dynamic basins of order, some of them linear and 
> hierarchical, others not.
>

It would require imaginative use of HyTime, topic maps or similar 
methods to approach these requirements.

Or perhaps more sophisticated forms of markup that take advantage of 
topological methods.

But conceding that markup could be improved isn't the same as sounding 
retreat into silent visual representation of literary analysis.

Silent visual representation of iterary analysis condemns scholars to 
the range of works they can read in a working lifetime.

There is no indexing of those judgements because there are no recorded 
judgements to index.

For all of their shortcomings, I would prefer to live with indexes and 
references by others than without them.

> To a publisher text as an "ordered hierarchy of content objects" makes
> perfect sense. To a literary critic it is laughable nonsense. To a
> philosopher it is an interesting hypothesis, I would suppose, whose
> implications need working out. To an historian it is evidence of people
> thinking in a particular way at a particular time, raising the question
> of how they came to think thus.
>
> In the digital humanities we are sometimes overly impressed by the
> portability of our methods and tools. We fail to see that when a method
> successful in one discipline is ported into another the game it is intended
> to play is different. The criteria which it must meet and the meaning of the
> terms in which scholars think are different. Just as platform-independent
> informational text cannot be known except by means of some platform or other
> (the term itself is wrong), computing is meaningless to the scholar unless
> manifested within the basic disciplinary context within which he or she is
> operating. Crossing the boundary of an epistemic culture successfully
> involves a complex blend of learning and teaching in what Peter Galison has
> usefully called a "trading zone" -- for which see Michael E. Gorman, ed.,
> Trading Zones and Interactional Expertise: Creating New Kinds of
> Collaboration (MIT Press, 2010).

Perhaps but do you have an example of an observation, judgement, 
comment, comparison, etc. by a literary critic that, if articulated, 
cannot be represented in markup?

I concede if a literary critic is silent, then there is nothing for 
markup to represent.

But I read Desmond's posts as saying he has statements about texts to be 
articulated. But that he prefers to do so without the use of markup.

I take McGann as arguing that markup needs to become more expressive, so 
as to capture more of what literary critics want to articulate, not that 
it should be abandoned in favour of silence.

Hope you are having a great week!

Patrick

-- 
Patrick Durusau
patrick at durusau.net
Technical Advisory Board, OASIS (TAB)
Former Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau



--[2]------------------------------------------------------------------------
        Date: Thu, 20 Dec 2012 10:38:05 -0500
        From: Patrick Durusau <patrick at durusau.net>
        Subject: Re: [Humanist] 26.609 XML & what kind of scholarship
        In-Reply-To: <20121220093155.44D3839DB at digitalhumanities.org>

Jerome,

On 12/20/2012 04:31 AM, Humanist Discussion Group wrote:
<snip>
> Patrick,
>
> I'll comment on a selection of your points, and try to be brief:
>
>> how does using markup differ between the
>> "born digital" and analog documents?
> In "born-digital" markup is part of what I write. It is a fact. In
> "born-analog-and-transcribed-to-digital", markup is an interpretation.
> It is different every time the "transcription" is redone by someone
> new. In born-digital markup is always the same. Although, as you point
> out, I may use the same tools in processing both born-digital and
> born-analog texts, the kinds of interaction between user and text in
> the two cases will differ significantly. For example, in the
> born-analog case we often request a facsimile side by side with its
> transcription so we can verify its accuracy. In the born digital case
> such a prop would be superfluous.
Certainly the elements required for "born-digital" versus 
"born-analog-and-transcribed-to-digital" differ, but then the elements 
for poetry and prose differ as well.

But the digital/analog distinction is one of two different starting 
points of interpretation. For the analog case, there is a visible focus 
of attention for the interpretative process. In born digital, it isn't 
visible to others. But both are interpretative processes.

>> Whether markup is standoff or embedded doesn't impact the attribution of
>> explicit semantics to a text. Any number of linguistic annotation
>> projects use forms of stand off markup.
>>
> It is true that "Standoff markup" has been used in linguistics since
> the early 1990s. And simply removing XML tags from a text and later
> putting them back doesn't change the status of the markup one iota.
> It's still a tree and you can still only have one markup set at a
> time. Being able to change one set for another is an advantage, but
> having the two stored separately is equally inconvenient, so there is
> no overall gain in usability.
>
> But "standoff properties" are different, because they have no real
> syntax they can be combined to enrich a text.
Rather say "some" standoff markups don't have "real syntax." I dimly 
remember the standoff efforts from Henry Thompson's group as having 
fairly definite notions of "real syntax," even though you could use them 
together.

<snip>
>> At least #2 on my list of requirements for a system for analysis of text
>> would be the explicit preservation of semantics of the text as I
>> interpreted it and analysis as I assigned it. In a form that can be
>> reliably interchanged with others.
>>
>> To lack explicit semantics for textual analysis means scholarship
>> returns to being an episodic enterprise that starts over with every
>> generation guessing what may have been meant by the prior generation and
>> laying the groundwork for their heirs to guess at theirs.
>>
> If by this you mean a standard and interchangeable format for
> describing text hermeneutically I am well aware of the ideals long
> voiced on the subject. But unfortunately "between the idea and the
> reality ... falls the shadow."
> What I see in TEI marked-up texts in practice is this: redefinition of
> tags that already exist under a different name, new attributes added
> willy-nilly when other ones already exist for the purpose,
> output-related information embedded into supposedly reusable and
> interchangeable texts, misuse of tags for the wrong purposes, and
> general ignorance of what it says in the Guidelines because people
> simply don't read them.

Sorry, I was presuming that markup makes explicit clues already extant 
in the text. Or rather, it "redefines" them to use your term, in a 
standard vocabulary. One that frees me from having to re-inspect the 
text to discover the clues.

That TEI or even markup in general has any number of examples of poor 
usage I don't dispute. I don't see how misuse of markup supports an 
argument against markup?

> At the XML level yes we can interchange texts with other XML programs
> for parsing and searching but can we interchange or interoperate texts
> at the level of subjective markup? I don't think so. And if you don't
> believe me read Syd Bauman's excellent piece in Balisage 2011, or
> Martin Mueller's open letter to the TEI. They know better than I do
> what they are talking about.
>
> http://ariadne.northwestern.edu/mmueller/teiletter.pdf
> http://www.balisage.net/Proceedings/vol7/html/Bauman01/BalisageVol7-Bauman01.html

I read Syd as arguing for blind interchange, a higher standard than 
interoperability and Martin for the TEI to do a better job of promoting 
non-curio use of the TEI for encoding texts.

Let me close with a brief example that may (may not) be helpful.

"New Testament Greek Manuscripts: Variant Readings Arranged in 
Horizontal Lines Against Codex Vaticanus, Mattew" by Reuben J. Swanson. 
Sheffield Academic Press, 1995.

Swanson concedes the important of the Church Fathers, versions (read 
early translations), etc., but includes only select Greek sources.

A perfectly understandable decision but a visual presentation that means:

1) Subsequent projects will have to re-enter, re-proof and re-align the 
texts Swanson has already produced.

2) Subsequent projects will have to enter, proof and align their 
additional texts.

3) Descriptive markup could lessen the costs of both #1 and #2 as well 
as provide other advantages for searching, analysis, etc.

It is important to not confuse the methodology (inline versus standoff 
markup) with the need for standard vocabularies for "blind interchange" 
to use Syd's terminology. Swanson has no vocabulary for alignment, it is 
entirely a visual artifact of his presentation.

Hope you are having a great week!

Patrick

-- 
Patrick Durusau
patrick at durusau.net
Technical Advisory Board, OASIS (TAB)
Former Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau



--[3]------------------------------------------------------------------------
        Date: Thu, 20 Dec 2012 10:57:19 -0500
        From: Wendell Piez <wapiez at wendellpiez.com>
        Subject: Re: [Humanist] 26.609 XML & what kind of scholarship
        In-Reply-To: <20121220093155.44D3839DB at digitalhumanities.org>

Dear Willard,

Please forgive a response that is probably of interest only to
practitioners and theorists of markup. I pursue this here mainly
because I know that some readers are interested, while there are no
other public venues (to my knowledge) as well suited for this
important discussion as this list.

Desmond writes:
> But "standoff properties" are different, because they have no real
> syntax they can be combined to enrich a text. The advantage is now
> decisive: I can add markup sets A, C, and E to a text but not B and D,
> and then format it, or I can choose B, C and D etc. and format that.
> This is definitely an improvement because it increases flexibility
> while providing a way to handle the ever-increasing complexity.

This is an important point -- and why I have focused on a model that
can be expressed both using a standoff convention, and (what Desmond
says is impossible) serialized in a markup syntax. I think the latter
is important because:

* It makes a good openly-specified interchange format.
* It should be useful for many purposes including lightweight
applications and pedagogy.
* Embedded markup is robust in the face of editorial changes
(documents being created, edited or curated), while standoff
conventions are not (at least not without a significant infrastructure
that must also be specified and built).
* Similarly, it is composable, in the (loose) sense that fragments of
instances are also instances.
* There is no reason the two approaches could not be combined to get
benefits of both. Indeed, if standoff properties are assigned to
ranges marked inline instead of to offsets in the text itself, the
problem of the brittleness of a standoff notation can be reduced.

In other words, I regard the "standoff vs embedded" discussion to be a
red herring, and mainly unproductive. Until we have an embedded markup
technology that can work this way, it is only theoretical; and when we
do, it becomes a practical problem amenable to experiment.

At the same time, I do agree that an embedded markup syntax should not
be absolutely indispensable, certainly not for every purpose. Its main
interest to me is as a demonstration of principles, a site of
development, and a bridge.

Desmond writes further (specifically about the LMNL project):
> You already know I think the "sawtooth syntax" is not a computer
> recognisable language because it apparently has no grammar that
> governs its entire syntax. The equivalence of "has a grammar" and "is
> computer recognisable" was acknowledged to be already "well known" by
> Chomsky in 1959. So I don't think the sawtooth syntax can do what you
> claim. However, I have no significant objection to the LMNL model
> itself; in fact it is rather clever.

The syntax has a grammar, here:

http://lmnl-markup.org/specs/archive/Detailed_LMNL_syntax.xhtml

What LMNL does not have is a grammar to describe document structures
(as opposed to a markup syntax).

In XML terms, this is as if we had a grammar for well-formed markup
capable of being parsed -- "computer recognizable" -- without a
grammar governing document structures (such as a DTD). As you know,
developers work with XML like this every day. In particular, XSLT and
XQuery do not require a DTD or any grammar describing documents; and
the data model on which they work (the XDM) can be derived from
well-formed XML without such a grammar. In practice, it turns out that
grammars *to describe document structures* are useful for optimizing
certain processes, but are not a sine qua non for processing in
general.

I freely concede Desmond's (and Chomsky's) point that a
computationally tractable syntax (patterned sequence of tokens) will
have at least an implicit grammar. But that's not at issue here. At
last year's workshop on Data Modeling at Brown and again at Balisage,
I demonstrated both lossless conversion back and forth between LMNL
syntax and a standoff representation of the model, and (rudimentary,
but interesting) applications based on the LMNL data model. So I don't
have to argue the theory: I'm parsing the stuff.

(And I'll be happy to do so again anywhere I am able to. :-)

In other words, this is a complete circuit: edit your tagged text and
see the changes reflected in processing.

http://balisage.net/Proceedings/vol8/html/Piez01/BalisageVol8-Piez01.html

How to validate LMNL to formal constraint sets specified outside its
applications remains an area for research. Both grammar-based
approaches (such as rabbit/duck grammars or Jeni Tennison's proposed
CREOLE language, an extension of RNG) and rules-based approaches
(analogous to Schematron for XML) are conceivable.

And this is true for any range model, including those represented
using standoff conventions.

Best regards,
Wendell

--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^



--[4]------------------------------------------------------------------------
        Date: Fri, 21 Dec 2012 05:49:10 +1000
        From: Desmond Schmidt <desmond.allan.schmidt at gmail.com>
        Subject: Re:  26.609 XML & what kind of scholarship
        In-Reply-To: <20121220093155.44D3839DB at digitalhumanities.org>

Wendell,

Willard is not mistaken. There are no practical markup languages
embedded in the text that are not OHCOs, for otherwise they would not
be computer recognisable languages. What you are referring to are data
structures that can be expressed using markup formalisms such as
linking (i.e. using IDs to connect elements). I can represent a
complete graph in XML that way but it doesn't mean that the XML
language in question has such a structure. It's still a tree. The
links themselves aren't part of the language. You can't write a
grammatical rule that controls which elements an ID can connect to, or
that the target must exist or that the links don't form a directed
cycle etc. Since you can't syntax-check any of that, such files should
be locked to prevent accidental damage. One way to achieve that is to
use a binary format.

Like you I find it incredibly limiting to say that humanistic data
must be represented by a tree. There are certainly hundreds if not a
limitless number of data structures in information science, and our
texts are subtly complex things that deserve better analysis than
being always hammered into a tree. Why can't we use some of those
other data structures to describe text better?

Desmond Schmidt
eResearch Lab
University of Queensland

On Thu, Dec 20, 2012 at 7:31 PM, Humanist Discussion Group
<willard.mccarty at mccarty.org.uk> wrote:
> It should go without saying that such markup would not be XML, which
> does indeed (due to its grammar) impose an OHCO -- meaning a text must
> either be reducible to such a hierarchy, or be (entirely) represented
> by means of such a hierarchy. (And while it's true that a free-form
> hierarchical database can be used to describe just about anything,
> there's a big difference between using XML in this way and using it
> for "markup".)
>
> Desmond is on record as opposing the use of embedded markup
> altogether, for reasons you hint at as well as others. But with
> respect to how we might better *model* the text and the information a
> text encodes (defining "text" broadly here), he and I agree on a great
> deal. The OHCO model is a convenience for some things and no help for
> others. By no means is it suited to support everything scholars wish
> to do.
>
> But please don't identify markup as such with the OHCO thesis. It
> wasn't ever thus, and it doesn't always have to be.



--[5]------------------------------------------------------------------------
        Date: Thu, 20 Dec 2012 16:36:54 -0500
        From: Patrick Durusau <patrick at durusau.net>
        Subject: The Power of Notation
        In-Reply-To: <20121220093155.44D3839DB at digitalhumanities.org>


Willard,

On the question of markup/notation, "Juggling by numbers: How notation 
revealed new tricks," http://www.bbc.co.uk/news/magazine-20728493 may be 
of interest.

The article describes Siteswap, a notation invented in the 1980's to 
describe juggling moves.

It is a numerical notation which lends itself to being searched for 
patterns.

A site with links to more documentation, software to simulate juggling, 
etc. http://www.siteswap.org/.

I mention this in part to follow up on Wendell's suggestion of a program 
of research on markup broadly defined.

What patterns would a notation broader than XML uncover? Any number of 
us have argued for such cases, using a variety of notations, but the 
isn't the same as a large body of material subject to common examination 
and debate over patterns and their usefulness.

Hope you are having a great week!

Patrick

-- 
Patrick Durusau
patrick at durusau.net
Technical Advisory Board, OASIS (TAB)
Former Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau



--[6]------------------------------------------------------------------------
        Date: Thu, 20 Dec 2012 20:17:02 -0500 (EST)
        From: drwender at aol.com
        Subject: Re:  26.605 XML, TEI and what kind of scholarship?
        In-Reply-To: <8CFAD2E86C52C7E-AE8-3C650 at webmail-m006.sysops.aol.com>


 Dear Willard,
  you wrote in 26.605: 
  
 > My King's colleague Elena Pierazzo's message several days ago drew much 
>  needed attention to the disciplinary perspective from which the question 
>  of markup is considered. She made the valuable point that systematic 
>  markup offers the textual editor the ability to record minute decisions 
>  at the location in the text where they are made. In the job-defining 
>  role as *editor* an editor must decide about this or that variant, mark of 
>  punctuation etc, but without markup and the computing which goes with it 
>  there is no way of recording decisions at the minute level of detail at 
>  which they are made. With it these decisions can be recorded. (Textual 
>  editors who know better please contradict.) 
 
 Yes, I would contradict that opinion, and not only because there is a kind of 
technological fallcy behind. Editorial ethics are not dependent from the tolls 
the editor uses to communicate his results analyzing the textual tradtion, 
his decisions based upon it and the literary text he proposes to be read. 
One example, a 'genetic edition' avant la lettre: On my book shelf I'm holding 
a 1924 test-edition of a worse manuscript - first draft of Fontane's "Effi Briest", 
ed. by Eduard Behrend - that shows the different layers of correction with 
typographical means on the left side and in parallel the final text from the first 
print edition on the right. Surely, Behrend's manuscript for this test-edition 
(the print job was  performed by the typestters of the "Reichsdruckerei") 
was 'marked-up' in his hand; but would tree processing any more clarity 
in representation?
 
 Best regars,
 Herbert
 
  
  
  




More information about the Humanist mailing list