[Humanist] 29.575 text-analysis for annotation?
Humanist Discussion Group
willard.mccarty at mccarty.org.uk
Wed Dec 23 07:23:51 CET 2015
Humanist Discussion Group, Vol. 29, No. 575.
Department of Digital Humanities, King's College London
Submit to: humanist at lists.digitalhumanities.org
Date: Wed, 23 Dec 2015 00:05:42 +0000
From: Alexandre Enkerli <aenkerli at vteducation.org>
Subject: Text Analysis, Easing in Semantic Annotations and Linked Data
Fairly longterm subscriber (since Volume 16, in 2003), occasional poster. And naïve humanist with a digital bent.
Would like your insight on a crazy idea about the combination of three threads having to do with Digital Humanities.
My main work isn’t really about DH, but as an ethnographer working in digital domains (technopedagogy, most recently), been thinking about the list on several occasions. For instance, Humanist came up during interactions with Stefan Sinclair around a lab about tools to support learners’ writing.
Stefan’s work with Geoffrey Rockwell on Voyant Tools has been on my mind quite a bit. Used Voyant to build a coursepack in material culture and later thought about the tools’ potential in providing feedback on learners’ writing (for that same lab). Then noticed Michael Widner’s work on essay revision, using Voyant Tools.
That’s one thread: unexpected uses of textual analysis. Lots of cool tools are listed on TAPoR and it’s fun to explore the possibilities. Maybe there’s a tool (or set of tools) out there which can enable my crazy idea? We’ll see…
Fastforward a few months to my “discovery” of Open Annotations in general and Hypothes.is <http://hypothes.is> in particular. Was lucky enough to benefit from interactions with Renoir Boulanger (former W3C DevOps) and Jeremy Dean (Lit Genius pioneer and now Director of Education for Hypothesis). Clearly, there’s something going on with annotations.
(The aforementioned Widner also created annotation tools (Lacuna Stories) and wrote a series of posts about Genius.)
Boom. Second thread.
Third thread is about Five Star Linked Data, which is actually closer to my work. It might be controversial in some circles, but it’s pretty neat for Libraries, Archives, Museums… and Education (though MEAL would be a better acronym than LAME).
See, we have this learning resource catalogue which takes part in the Semantic Web movement and conforms to Normetic, a local application profile for learning resources. (Normetic is currently switching from LOM to MLR, getting deeper into Linked Data.)
Our platform has recently added the ability to combine learning resource metadata from multiple sources, especially useful as a way to tie those resources to diverse ontologies (ministry of education’s competencies, Bloom’s taxonomy, accessibility standards…). Exciting stuff!
A problem here is that indexing work can be rather difficult. In fact, the same can be said about much work for the Semantic Web in general and Linked Data in particular. It gets quite technical quite quickly. In a way, it’s as though we were at the same point with the Semantic Web as we were with the WWW before NCSA Mosaic. There are people who know a lot about SPARQL, SKOS, DBPedia, Turtle, DCMI, VDEX, etc. Maybe some of you are experts in all of these things. But it’s particularly hard to get non-experts to contribute to the Web of Data.
Which is where the crazy idea comes in: what if we could use textual analysis to ease out semantic annotations and contribute to the Web of Data?
Recently listened to the audiobook version of Walter Isaacson’s The Innovators, a kind of collective biography of diverse people involved in the “digital revolution” (from Ada Lovelace to Marissa Mayer).
Through the book, couldn’t help but feel that it should be converted into a Linked Data version. A lot of the content sounds like a narrative expression of RDFa. Having URIs for each entity would make the book more valuable as a resource. Sure, the same factoids about the links between these people are already available elsewhere (Ada Lovelace’s FoaF page probably contains more useful data than Isaacson’s book). But there’s something to be said about following links from a text to the Web of Data.
It might be possible to annotate Isaacson’s book semi-automatically, tagging individual occurrences of “Ada Lovelace”, “Lady Lovelace”, “Lord Byron’s daughter”, etc. Corpus tools like those created by Rockwell and Sinclair would be quite useful, here. Especially if they were combined with Open Annotations. And if these annotations generated the necessary code to be integrated in the Web of Data. Obviously, the process could then apply to Eric Raymond’s The Cathedral and the Bazaar, Lawrence Lessig’s Free Culture, and Christopher Kelty’s Two Bits. (Conveniently, these last three texts are all available in HTML…) Maybe we could throw in some Markdown (or CriticMarkup) in the mix, for good measure, as plain text annotation tends to be easier for many people than XML and other *ML.
As a non-coder, my options are to dig through TAPoR and other repertoires for a tool which does all this and/or to send feature requests to Voyant Tools, Hypothesis, etc.
So… How crazy is all of this? Could we use text analysis to facilitate a type of annotation which can then contribute to Linked Data (LODLAM+education, etc.)?
Will probably follow this up on the LOD folks.
Quite possibly, though, this may all be related to things like the Text Encoding Initiative, Federated Wiki, ePUB3, and Wikity. If so, there are Humanist listmembers who can talk to these points.
Thanks for any insight.
Alex Enkerli, Learning Technology Advisor
More information about the Humanist