[Humanist] 29.582 text-analysis for annotation

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Sun Dec 27 10:46:32 CET 2015

                 Humanist Discussion Group, Vol. 29, No. 582.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

        Date: Thu, 24 Dec 2015 16:59:17 +0000
        From: Jonathan Reeve <jon.reeve at gmail.com>
        Subject: Re:  29.575 text-analysis for annotation?
        In-Reply-To: <20151223062351.ED95C79A3 at digitalhumanities.org>

Dear Alexandre,

The idea of programmatically generating annotations and contributing to the
Semantic Web is utterly fascinating, and totally doable. You should do it!

My advice would be to learn a scripting language like Python, and start
coding it up. What you're suggesting doesn't require a computer science
background, and you can teach yourself almost all you need to know on
CodeCademy if you have a week or two off. Voyant Tools is an end-user
product, as are most of the tools listed on TAPoR, but a project like this
needs text analysis programming libraries, and not end-user applications,
since the various software packages you'll use need to talk to each other
more than they need a web interface.

But you don't even really need to program if you can get the right people
interested in the project. Why not start a repository for the project on
GitHub, and invite a few NLP programmers to contribute?

Hypothesis has an API that you can use to enter your annotations
programmatically. Their text location syntax is not super-well documented
(as they admit themselves), but it's easy to figure out if you take a close
look at some of the responses. Once you can scrape together the types of
annotations you want, you can programmatically enter them on Hypothesis.
I'm not aware of a Hypothesis to Semantic Web pipeline, or a Hypothesis
to/from TEI pipeline, but this sounds like a great endeavor. You could
fairly easily put together something in Python that reads annotations from
TEI and generates Hypothesis annotations for them, or vice-versa.

I just did some experiments with annotation in TEI, myself, here:


It uses "interp" tags to annotate segments of text, not unlike what Barthes
does in S/Z. From there, the interpretations are translated to HTML with
XSLT, and made interactive with some light jQuery. The code for the
project, including the TEI XML, can be found here:

I'm not very familiar with RDF, Federated Wiki, or ePUB3. Perhaps someone
else here could chime in about those?

Hope that helps,


On Wed, Dec 23, 2015 at 1:23 AM Humanist Discussion Group <
willard.mccarty at mccarty.org.uk> wrote:

>                  Humanist Discussion Group, Vol. 29, No. 575.
>             Department of Digital Humanities, King's College London
>                        www.digitalhumanities.org/humanist
>                 Submit to: humanist at lists.digitalhumanities.org
>         Date: Wed, 23 Dec 2015 00:05:42 +0000
>         From: Alexandre Enkerli <aenkerli at vteducation.org>
>         Subject: Text Analysis, Easing in Semantic Annotations and Linked
> Data
> Hello all!
> Fairly longterm subscriber (since Volume 16, in 2003[1]), occasional
> poster. And naïve humanist with a digital bent.
> Would like your insight on a crazy idea about the combination of three
> threads having to do with Digital Humanities.
> My main work isn’t really about DH, but as an ethnographer working in
> digital domains (technopedagogy, most recently), been thinking about the
> list on several occasions. For instance, Humanist came up during
> interactions with Stefan Sinclair around a lab about tools to support
> learners’ writing[2].
> Stefan’s work with Geoffrey Rockwell on Voyant Tools[3] has been on my
> mind quite a bit. Used Voyant to build a coursepack in material culture and
> later thought about the tools’ potential in providing feedback on learners’
> writing (for that same lab[2]). Then noticed Michael Widner’s work on essay
> revision[4], using Voyant Tools.
> That’s one thread: unexpected uses of textual analysis. Lots of cool tools
> are listed on TAPoR[5] and it’s fun to explore the possibilities. Maybe
> there’s a tool (or set of tools) out there which can enable my crazy idea?
> We’ll see…
> Fastforward a few months to my “discovery” of Open Annotations in general
> and Hypothes.is  http://hypothes.is  in particular. Was lucky enough to
> benefit from interactions with Renoir Boulanger (former W3C DevOps) and
> Jeremy Dean (Lit Genius pioneer and now Director of Education for
> Hypothesis). Clearly, there’s something going on with annotations.
> (The aforementioned Widner also created annotation tools (Lacuna
> Stories[6]) and wrote a series of posts about Genius[7].)
> Boom. Second thread.
> Third thread is about Five Star Linked Data, which is actually closer to
> my work. It might be controversial in some circles, but it’s pretty neat
> for Libraries, Archives, Museums… and Education (though MEAL would be a
> better acronym than LAME).
> See, we have this learning resource catalogue[8] which takes part in the
> Semantic Web movement and conforms to Normetic, a local application profile
> for learning resources[9]. (Normetic is currently switching from LOM to
> MLR[10], getting deeper into Linked Data[11].)
> Our platform has recently added the ability to combine learning resource
> metadata from multiple sources, especially useful as a way to tie those
> resources to diverse ontologies (ministry of education’s competencies,
> Bloom’s taxonomy, accessibility standards…). Exciting stuff!
> A problem here is that indexing work can be rather difficult. In fact, the
> same can be said about much work for the Semantic Web in general and Linked
> Data in particular. It gets quite technical quite quickly. In a way, it’s
> as though we were at the same point with the Semantic Web as we were with
> the WWW before NCSA Mosaic. There are people who know a lot about SPARQL,
> SKOS, DBPedia, Turtle, DCMI, VDEX, etc. Maybe some of you are experts in
> all of these things. But it’s particularly hard to get non-experts to
> contribute to the Web of Data.
> Which is where the crazy idea comes in: what if we could use textual
> analysis to ease out semantic annotations and contribute to the Web of Data?
> Recently listened to the audiobook version of Walter Isaacson’s The
> Innovators[12], a kind of collective biography of diverse people involved
> in the “digital revolution” (from Ada Lovelace to Marissa Mayer).
> Through the book, couldn’t help but feel that it should be converted into
> a Linked Data version. A lot of the content sounds like a narrative
> expression of RDFa. Having URIs for each entity would make the book more
> valuable as a resource. Sure, the same factoids about the links between
> these people are already available elsewhere (Ada Lovelace’s FoaF page
> probably contains more useful data than Isaacson’s book). But there’s
> something to be said about following links from a text to the Web of Data.
> It might be possible to annotate Isaacson’s book semi-automatically,
> tagging individual occurrences of “Ada Lovelace”, “Lady Lovelace”, “Lord
> Byron’s daughter”, etc. Corpus tools like those created by Rockwell and
> Sinclair would be quite useful, here. Especially if they were combined with
> Open Annotations. And if these annotations generated the necessary code to
> be integrated in the Web of Data. Obviously, the process could then apply
> to Eric Raymond’s The Cathedral and the Bazaar[13], Lawrence Lessig’s Free
> Culture[14], and Christopher Kelty’s Two Bits[15]. (Conveniently, these
> last three texts are all available in HTML…) Maybe we could throw in some
> Markdown[16] (or CriticMarkup[17]) in the mix, for good measure, as plain
> text annotation tends to be easier for many people than XML and other *ML.
> As a non-coder, my options are to dig through TAPoR[5] and other
> repertoires for a tool which does all this and/or to send feature requests
> to Voyant Tools, Hypothesis, etc.
> So… How crazy is all of this? Could we use text analysis to facilitate a
> type of annotation which can then contribute to Linked Data
> (LODLAM+education, etc.)?
> Will probably follow this up on the LOD folks[20].
> Quite possibly, though, this may all be related to things like the Text
> Encoding Initiative, Federated Wiki[18], ePUB3, and Wikity[19]. If so,
> there are Humanist listmembers who can talk to these points.
> Thanks for any insight.
> --
> Alex Enkerli, Learning Technology Advisor
> Vitrine technologie-éducation
> http://www.vteducation.org/en
> [1] http://dhhumanist.org/Archives/Virginia/v16/0646.html
> [2] http://www.vteducation.org/en/laboratories/writing-support-lab
> [3] http://docs.voyant-tools.org/
> [4]
> http://lessonplans.dwrl.utexas.edu/content/essay-revision-automated-textual-analysis
> [5] http://tapor.ca
> [6] http://www.lacunastories.com/
> [7]
> https://people.stanford.edu/widner/content/problems-genius-part-one-online-annotations-consensus-and-bias
> <
> https://people.stanford.edu/widner/content/problems-genius-part-three-connected-learning-lacuna-stories
> >
> [8] http://ceres.vteducation.org/app/?lang=en
> [9] http://www.normetic.org/
> [10] http://www.gtn-quebec.org/node/1004
> [11] https://en.wikipedia.org/wiki/ISO/IEC_19788
> [12]
> http://books.simonandschuster.com/The-Innovators/Walter-Isaacson/9781442376236
> [13] http://www.catb.org/esr/writings/cathedral-bazaar/
> [14] http://free-culture.cc/
> [15] http://twobits.net/
> [16] http://daringfireball.net/projects/markdown/
> [17] http://criticmarkup.com/
> [18] http://fed.wiki.org/welcome-visitors.html
> [19] http://wikity.cc/
> [20] http://digitalhumanities.org/lod/

More information about the Humanist mailing list