[Humanist] 29.584 text-analysis for annotation

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Mon Dec 28 10:27:10 CET 2015

                 Humanist Discussion Group, Vol. 29, No. 584.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

        Date: Sun, 27 Dec 2015 12:37:46 -0600
        From: Michael Widner <mikewidner at stanford.edu>
        Subject: Re: [Humanist] 29.575 text-analysis for annotation?
        In-Reply-To: <20151223062351.ED95C79A3 at digitalhumanities.org>

Hello Alex,

The idea of using annotations for text analysis is one that I've thought 
of quite a bit myself. One of the things we're going to be doing next 
with Lacuna Stories is creating a training set of annotations, then 
using machine learning to classify student annotations automatically. 
That's tangential to what you're discussing here, though, which sounds 
more like you're interested in using annotations as a bridge to the 
semantic web.

The javascript library that both Hypothesis and Lacuna Stories is built 
on is called Annotator.js. It's pretty straight-forward to create 
extensions to it. I could envision creating some plugins that link to 
various LOD databases like VIAF and others, then automatically prompt 
users for suggestions that match the text they've highlighted. With some 
named entity extraction, you could even prepopulate the texts with 
annotations that link entities to these resources. There are also a 
variety of tools out there that already "annotate" texts for named 
entities (GATE, Stanford NER, Python's nltk library, etc.), but not in a 
way that converts them into LOD, as far as I know. There could be, and 
I'd love to hear about any.

There's also a group that's meeting in May at Purdue to discuss issues 
around annotation, called the Cove Collective: 
http://covecollective.org/ Although this group is focused on Victorian 
literature, several of us (Amanda Visconti and myself, in particular) 
are interested in digital annotation more widely. You might want to get 
in touch with Amanda, who is an Asst. Prof. and Digital Humanities 
Specialist at Purdue Libraries. Her digital dissertation was an 
collaboratively-annotated edition of Ulysses, which she built up based 
in part on code I wrote for Lacuna Stories.

Anyway, just a few stray thoughts on the topic. I'd be happy to chat 
further about the topic some time. We have several projects at Stanford 
around text analysis, annotation, and linked open data... just none that 
tie them all together yet.



On 12/23/15 12:23 AM, Humanist Discussion Group wrote:
>                   Humanist Discussion Group, Vol. 29, No. 575.
>              Department of Digital Humanities, King's College London
>                         www.digitalhumanities.org/humanist
>                  Submit to: humanist at lists.digitalhumanities.org
>          Date: Wed, 23 Dec 2015 00:05:42 +0000
>          From: Alexandre Enkerli <aenkerli at vteducation.org>
>          Subject: Text Analysis, Easing in Semantic Annotations and Linked Data
> Hello all!
> Fairly longterm subscriber (since Volume 16, in 2003[1]), occasional poster. And naïve humanist with a digital bent.
> Would like your insight on a crazy idea about the combination of three threads having to do with Digital Humanities.
> My main work isn’t really about DH, but as an ethnographer working in digital domains (technopedagogy, most recently), been thinking about the list on several occasions. For instance, Humanist came up during interactions with Stefan Sinclair around a lab about tools to support learners’ writing[2].
> Stefan’s work with Geoffrey Rockwell on Voyant Tools[3] has been on my mind quite a bit. Used Voyant to build a coursepack in material culture and later thought about the tools’ potential in providing feedback on learners’ writing (for that same lab[2]). Then noticed Michael Widner’s work on essay revision[4], using Voyant Tools.
> That’s one thread: unexpected uses of textual analysis. Lots of cool tools are listed on TAPoR[5] and it’s fun to explore the possibilities. Maybe there’s a tool (or set of tools) out there which can enable my crazy idea? We’ll see…
> Fastforward a few months to my 'œdiscovery' of Open Annotations in general and Hypothes.is  http://hypothes.is  in particular. Was lucky enough to benefit from interactions with Renoir Boulanger (former W3C DevOps) and Jeremy Dean (Lit Genius pioneer and now Director of Education for Hypothesis). Clearly, there’s something going on with annotations.
> (The aforementioned Widner also created annotation tools (Lacuna Stories[6]) and wrote a series of posts about Genius[7].)
> Boom. Second thread.
> Third thread is about Five Star Linked Data, which is actually closer to my work. It might be controversial in some circles, but it’s pretty neat for Libraries, Archives, Museums… and Education (though MEAL would be a better acronym than LAME).
> See, we have this learning resource catalogue[8] which takes part in the Semantic Web movement and conforms to Normetic, a local application profile for learning resources[9]. (Normetic is currently switching from LOM to MLR[10], getting deeper into Linked Data[11].)
> Our platform has recently added the ability to combine learning resource metadata from multiple sources, especially useful as a way to tie those resources to diverse ontologies (ministry of education’s competencies, Bloom’s taxonomy, accessibility standards…). Exciting stuff!
> A problem here is that indexing work can be rather difficult. In fact, the same can be said about much work for the Semantic Web in general and Linked Data in particular. It gets quite technical quite quickly. In a way, it’s as though we were at the same point with the Semantic Web as we were with the WWW before NCSA Mosaic. There are people who know a lot about SPARQL, SKOS, DBPedia, Turtle, DCMI, VDEX, etc. Maybe some of you are experts in all of these things. But it’s particularly hard to get non-experts to contribute to the Web of Data.
> Which is where the crazy idea comes in: what if we could use textual analysis to ease out semantic annotations and contribute to the Web of Data?
> Recently listened to the audiobook version of Walter Isaacson’s The Innovators[12], a kind of collective biography of diverse people involved in the “digital revolution” (from Ada Lovelace to Marissa Mayer).
> Through the book, couldn’t help but feel that it should be converted into a Linked Data version. A lot of the content sounds like a narrative expression of RDFa. Having URIs for each entity would make the book more valuable as a resource. Sure, the same factoids about the links between these people are already available elsewhere (Ada Lovelace’s FoaF page probably contains more useful data than Isaacson’s book). But there’s something to be said about following links from a text to the Web of Data.
> It might be possible to annotate Isaacson’s book semi-automatically, tagging individual occurrences of 'Ada Lovelace', 'œLady Lovelace', 'Lord Byron'™s daughter', etc. Corpus tools like those created by Rockwell and Sinclair would be quite useful, here. Especially if they were combined with Open Annotations. And if these annotations generated the necessary code to be integrated in the Web of Data. Obviously, the process could then apply to Eric Raymond’s The Cathedral and the Bazaar[13], Lawrence Lessig’s Free Culture[14], and Christopher Kelty’s Two Bits[15]. (Conveniently, these last three texts are all available in HTML.) Maybe we could throw in some Markdown[16] (or CriticMarkup[17]) in the mix, for good measure, as plain text annotation tends to be easier for many people than XML and other *ML.
> As a non-coder, my options are to dig through TAPoR[5] and other repertoires for a tool which does all this and/or to send feature requests to Voyant Tools, Hypothesis, etc.
> So… How crazy is all of this? Could we use text analysis to facilitate a type of annotation which can then contribute to Linked Data (LODLAM+education, etc.)?
> Will probably follow this up on the LOD folks[20].
> Quite possibly, though, this may all be related to things like the Text Encoding Initiative, Federated Wiki[18], ePUB3, and Wikity[19]. If so, there are Humanist listmembers who can talk to these points.
> Thanks for any insight.
> --
> Alex Enkerli, Learning Technology Advisor
> Vitrine technologie-éducation
> http://www.vteducation.org/en
> [1] http://dhhumanist.org/Archives/Virginia/v16/0646.html
> [2] http://www.vteducation.org/en/laboratories/writing-support-lab
> [3] http://docs.voyant-tools.org/
> [4] http://lessonplans.dwrl.utexas.edu/content/essay-revision-automated-textual-analysis
> [5] http://tapor.ca
> [6] http://www.lacunastories.com/
> [7] https://people.stanford.edu/widner/content/problems-genius-part-one-online-annotations-consensus-and-bias<https://people.stanford.edu/widner/content/problems-genius-part-three-connected-learning-lacuna-stories>
> [8] http://ceres.vteducation.org/app/?lang=en
> [9] http://www.normetic.org/
> [10] http://www.gtn-quebec.org/node/1004
> [11] https://en.wikipedia.org/wiki/ISO/IEC_19788
> [12] http://books.simonandschuster.com/The-Innovators/Walter-Isaacson/9781442376236
> [13] http://www.catb.org/esr/writings/cathedral-bazaar/
> [14] http://free-culture.cc/
> [15] http://twobits.net/
> [16] http://daringfireball.net/projects/markdown/
> [17] http://criticmarkup.com/
> [18] http://fed.wiki.org/welcome-visitors.html
> [19] http://wikity.cc/
> [20] http://digitalhumanities.org/lod/

More information about the Humanist mailing list