[Humanist] 29.26 billions of pages' worth; hammer and nail

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Thu May 14 07:13:21 CEST 2015

                  Humanist Discussion Group, Vol. 29, No. 26.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    Norman Gray <norman at astro.gla.ac.uk>                      (24)
        Subject: Re:  29.24 billions of pages' worth

  [2]   From:    Charles Faulhaber <cbf at berkeley.edu>                       (8)
        Subject: RE: [Humanist] 29.25 happy birthday Humanist

        Date: Wed, 13 May 2015 09:50:37 +0100
        From: Norman Gray <norman at astro.gla.ac.uk>
        Subject: Re:  29.24 billions of pages' worth
        In-Reply-To: <20150513051508.A367A668A at digitalhumanities.org>


Desmond Schmidt wrote

> I think it is misleading to describe technical transitions such as
> SGML->XML or XML->JSON as a "war of religion". That term might be an
> appropriate analogy if it were a mere matter of taste to choose between
> two concurrent and competing technologies, but not to describe technical
> succession.

We should not fall into the trap that some have unwittingly laid, of thinking of this as 'succession'.  As Desmond notes, "JSON is not a replacement for all uses of XML (like TEI), but it is a suitable format for metadata."

For various largely historical reasons, XML had come to occupy a larger area of the markup/metadata/serialisation domain than was entirely comfortable.  For some parts of that domain -- specifically those with highly-structured data, relatively little text, and little need for validation -- XML had become distinctly uncomfortable.

JSON comes with a lot less baggage: carry-on only, nothing checked in; for some voyages, more would be too much. It's wonderfully liberating, and I've used it on a number of occasions.

For those occasions when a little more is useful, there are some efforts to add structure back in.  Look at  http://json-schema.org  (there may be others).  I don't have a good feeling about those (for one thing, json-schema seems to insist on writing the schema in JSON notation -- a neat trick which went horribly wrong last time with XML Schema).  Schemas are complicated, and I suspect that the effort to add them to JSON will end up adding a degree of notational complexity (and warmth of standards-group invective -- JSON is not free of its Enthusiasts) which will make us nostalgic for XML with all its warts.


The following is a bit of a historical tangent.

> As to whether JSON is better than XML, I have never understood what
> purpose is served by the arcane distinction between attributes and
> elements, or why tag-names must be repeated at element-end.

I think I can explain those, or at least explain why an apparently bizarre decision was reasonable; both are to some extent atavisms. (I hope Desmond will forgive me if I am answering an implied rhetorical question)

Without going into a certainly arcane tangent about data versus metadata, I think one can recall that when working with texts -- that is, doing SGML markup -- the distinction was never in practice terribly confusing.  There were decisions to be made, and I'm sure many on this list can recall or generate relevant rules of thumb, but it's only when *ML expanded, with the web, to cover areas which were were not _really_ 'markup' (*handwaving*), that the distinction became something of a fossil one.

The element end-tags of SGML were important because they let the markup author indicate unambiguously when an element had ended, either for the sake of error-checking, or to avoid an otherwise ambiguous parse.  Because SGML was designed to be typed out without editor support, however, there was lots of minimisation: end tags might in various circumstances collapse to '</>', or '/', or in many/most cases be omitted entirely, and one only rarely had to actually include them in the text; whole layers of markup could vanish from sight.  When XML was derived from it, the desire to simplify the job for parsers, combined with the realisation that most people (for some unspecified value of 'most') would be using smart editors which would handle the end-tags, meant that all the minimisation functionality was dropped, making the result appear as it now does.  Since over the last decade the ratio of tag to text has probably gone up significantly, the result does sometimes look a bit of a mess.

Best wishes,


Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK

        Date: Wed, 13 May 2015 09:55:50 -0700
        From: Charles Faulhaber <cbf at berkeley.edu>
        Subject: RE: [Humanist] 29.25 happy birthday Humanist
        In-Reply-To: <20150513054343.B70D9102D at digitalhumanities.org>

In re technology dominating inquiry.

More succinctly: If you only have a hammer, every problem is a nail....

Charles Faulhaber

More information about the Humanist mailing list