[Humanist] 29.582 text-analysis for annotation
Humanist Discussion Group
willard.mccarty at mccarty.org.uk
Sun Dec 27 10:46:32 CET 2015
Humanist Discussion Group, Vol. 29, No. 582.
Department of Digital Humanities, King's College London
Submit to: humanist at lists.digitalhumanities.org
Date: Thu, 24 Dec 2015 16:59:17 +0000
From: Jonathan Reeve <jon.reeve at gmail.com>
Subject: Re: 29.575 text-analysis for annotation?
In-Reply-To: <20151223062351.ED95C79A3 at digitalhumanities.org>
The idea of programmatically generating annotations and contributing to the
Semantic Web is utterly fascinating, and totally doable. You should do it!
My advice would be to learn a scripting language like Python, and start
coding it up. What you're suggesting doesn't require a computer science
background, and you can teach yourself almost all you need to know on
CodeCademy if you have a week or two off. Voyant Tools is an end-user
product, as are most of the tools listed on TAPoR, but a project like this
needs text analysis programming libraries, and not end-user applications,
since the various software packages you'll use need to talk to each other
more than they need a web interface.
But you don't even really need to program if you can get the right people
interested in the project. Why not start a repository for the project on
GitHub, and invite a few NLP programmers to contribute?
Hypothesis has an API that you can use to enter your annotations
programmatically. Their text location syntax is not super-well documented
(as they admit themselves), but it's easy to figure out if you take a close
look at some of the responses. Once you can scrape together the types of
annotations you want, you can programmatically enter them on Hypothesis.
I'm not aware of a Hypothesis to Semantic Web pipeline, or a Hypothesis
to/from TEI pipeline, but this sounds like a great endeavor. You could
fairly easily put together something in Python that reads annotations from
TEI and generates Hypothesis annotations for them, or vice-versa.
I just did some experiments with annotation in TEI, myself, here:
It uses "interp" tags to annotate segments of text, not unlike what Barthes
does in S/Z. From there, the interpretations are translated to HTML with
XSLT, and made interactive with some light jQuery. The code for the
project, including the TEI XML, can be found here:
I'm not very familiar with RDF, Federated Wiki, or ePUB3. Perhaps someone
else here could chime in about those?
Hope that helps,
On Wed, Dec 23, 2015 at 1:23 AM Humanist Discussion Group <
willard.mccarty at mccarty.org.uk> wrote:
> Humanist Discussion Group, Vol. 29, No. 575.
> Department of Digital Humanities, King's College London
> Submit to: humanist at lists.digitalhumanities.org
> Date: Wed, 23 Dec 2015 00:05:42 +0000
> From: Alexandre Enkerli <aenkerli at vteducation.org>
> Subject: Text Analysis, Easing in Semantic Annotations and Linked
> Hello all!
> Fairly longterm subscriber (since Volume 16, in 2003), occasional
> poster. And naïve humanist with a digital bent.
> Would like your insight on a crazy idea about the combination of three
> threads having to do with Digital Humanities.
> My main work isn’t really about DH, but as an ethnographer working in
> digital domains (technopedagogy, most recently), been thinking about the
> list on several occasions. For instance, Humanist came up during
> interactions with Stefan Sinclair around a lab about tools to support
> learners’ writing.
> Stefan’s work with Geoffrey Rockwell on Voyant Tools has been on my
> mind quite a bit. Used Voyant to build a coursepack in material culture and
> later thought about the tools’ potential in providing feedback on learners’
> writing (for that same lab). Then noticed Michael Widner’s work on essay
> revision, using Voyant Tools.
> That’s one thread: unexpected uses of textual analysis. Lots of cool tools
> are listed on TAPoR and it’s fun to explore the possibilities. Maybe
> there’s a tool (or set of tools) out there which can enable my crazy idea?
> We’ll see…
> Fastforward a few months to my “discovery” of Open Annotations in general
> and Hypothes.is http://hypothes.is in particular. Was lucky enough to
> benefit from interactions with Renoir Boulanger (former W3C DevOps) and
> Jeremy Dean (Lit Genius pioneer and now Director of Education for
> Hypothesis). Clearly, there’s something going on with annotations.
> (The aforementioned Widner also created annotation tools (Lacuna
> Stories) and wrote a series of posts about Genius.)
> Boom. Second thread.
> Third thread is about Five Star Linked Data, which is actually closer to
> my work. It might be controversial in some circles, but it’s pretty neat
> for Libraries, Archives, Museums… and Education (though MEAL would be a
> better acronym than LAME).
> See, we have this learning resource catalogue which takes part in the
> Semantic Web movement and conforms to Normetic, a local application profile
> for learning resources. (Normetic is currently switching from LOM to
> MLR, getting deeper into Linked Data.)
> Our platform has recently added the ability to combine learning resource
> metadata from multiple sources, especially useful as a way to tie those
> resources to diverse ontologies (ministry of education’s competencies,
> Bloom’s taxonomy, accessibility standards…). Exciting stuff!
> A problem here is that indexing work can be rather difficult. In fact, the
> same can be said about much work for the Semantic Web in general and Linked
> Data in particular. It gets quite technical quite quickly. In a way, it’s
> as though we were at the same point with the Semantic Web as we were with
> the WWW before NCSA Mosaic. There are people who know a lot about SPARQL,
> SKOS, DBPedia, Turtle, DCMI, VDEX, etc. Maybe some of you are experts in
> all of these things. But it’s particularly hard to get non-experts to
> contribute to the Web of Data.
> Which is where the crazy idea comes in: what if we could use textual
> analysis to ease out semantic annotations and contribute to the Web of Data?
> Recently listened to the audiobook version of Walter Isaacson’s The
> Innovators, a kind of collective biography of diverse people involved
> in the “digital revolution” (from Ada Lovelace to Marissa Mayer).
> Through the book, couldn’t help but feel that it should be converted into
> a Linked Data version. A lot of the content sounds like a narrative
> expression of RDFa. Having URIs for each entity would make the book more
> valuable as a resource. Sure, the same factoids about the links between
> these people are already available elsewhere (Ada Lovelace’s FoaF page
> probably contains more useful data than Isaacson’s book). But there’s
> something to be said about following links from a text to the Web of Data.
> It might be possible to annotate Isaacson’s book semi-automatically,
> tagging individual occurrences of “Ada Lovelace”, “Lady Lovelace”, “Lord
> Byron’s daughter”, etc. Corpus tools like those created by Rockwell and
> Sinclair would be quite useful, here. Especially if they were combined with
> Open Annotations. And if these annotations generated the necessary code to
> be integrated in the Web of Data. Obviously, the process could then apply
> to Eric Raymond’s The Cathedral and the Bazaar, Lawrence Lessig’s Free
> Culture, and Christopher Kelty’s Two Bits. (Conveniently, these
> last three texts are all available in HTML…) Maybe we could throw in some
> Markdown (or CriticMarkup) in the mix, for good measure, as plain
> text annotation tends to be easier for many people than XML and other *ML.
> As a non-coder, my options are to dig through TAPoR and other
> repertoires for a tool which does all this and/or to send feature requests
> to Voyant Tools, Hypothesis, etc.
> So… How crazy is all of this? Could we use text analysis to facilitate a
> type of annotation which can then contribute to Linked Data
> (LODLAM+education, etc.)?
> Will probably follow this up on the LOD folks.
> Quite possibly, though, this may all be related to things like the Text
> Encoding Initiative, Federated Wiki, ePUB3, and Wikity. If so,
> there are Humanist listmembers who can talk to these points.
> Thanks for any insight.
> Alex Enkerli, Learning Technology Advisor
> Vitrine technologie-éducation
>  http://dhhumanist.org/Archives/Virginia/v16/0646.html
>  http://www.vteducation.org/en/laboratories/writing-support-lab
>  http://docs.voyant-tools.org/
>  http://tapor.ca
>  http://www.lacunastories.com/
>  http://ceres.vteducation.org/app/?lang=en
>  http://www.normetic.org/
>  http://www.gtn-quebec.org/node/1004
>  https://en.wikipedia.org/wiki/ISO/IEC_19788
>  http://www.catb.org/esr/writings/cathedral-bazaar/
>  http://free-culture.cc/
>  http://twobits.net/
>  http://daringfireball.net/projects/markdown/
>  http://criticmarkup.com/
>  http://fed.wiki.org/welcome-visitors.html
>  http://wikity.cc/
>  http://digitalhumanities.org/lod/
More information about the Humanist