[Humanist] 29.584 text-analysis for annotation
Humanist Discussion Group
willard.mccarty at mccarty.org.uk
Mon Dec 28 10:27:10 CET 2015
Humanist Discussion Group, Vol. 29, No. 584.
Department of Digital Humanities, King's College London
Submit to: humanist at lists.digitalhumanities.org
Date: Sun, 27 Dec 2015 12:37:46 -0600
From: Michael Widner <mikewidner at stanford.edu>
Subject: Re: [Humanist] 29.575 text-analysis for annotation?
In-Reply-To: <20151223062351.ED95C79A3 at digitalhumanities.org>
The idea of using annotations for text analysis is one that I've thought
of quite a bit myself. One of the things we're going to be doing next
with Lacuna Stories is creating a training set of annotations, then
using machine learning to classify student annotations automatically.
That's tangential to what you're discussing here, though, which sounds
more like you're interested in using annotations as a bridge to the
on is called Annotator.js. It's pretty straight-forward to create
extensions to it. I could envision creating some plugins that link to
various LOD databases like VIAF and others, then automatically prompt
users for suggestions that match the text they've highlighted. With some
named entity extraction, you could even prepopulate the texts with
annotations that link entities to these resources. There are also a
variety of tools out there that already "annotate" texts for named
entities (GATE, Stanford NER, Python's nltk library, etc.), but not in a
way that converts them into LOD, as far as I know. There could be, and
I'd love to hear about any.
There's also a group that's meeting in May at Purdue to discuss issues
around annotation, called the Cove Collective:
http://covecollective.org/ Although this group is focused on Victorian
literature, several of us (Amanda Visconti and myself, in particular)
are interested in digital annotation more widely. You might want to get
in touch with Amanda, who is an Asst. Prof. and Digital Humanities
Specialist at Purdue Libraries. Her digital dissertation was an
collaboratively-annotated edition of Ulysses, which she built up based
in part on code I wrote for Lacuna Stories.
Anyway, just a few stray thoughts on the topic. I'd be happy to chat
further about the topic some time. We have several projects at Stanford
around text analysis, annotation, and linked open data... just none that
tie them all together yet.
On 12/23/15 12:23 AM, Humanist Discussion Group wrote:
> Humanist Discussion Group, Vol. 29, No. 575.
> Department of Digital Humanities, King's College London
> Submit to: humanist at lists.digitalhumanities.org
> Date: Wed, 23 Dec 2015 00:05:42 +0000
> From: Alexandre Enkerli <aenkerli at vteducation.org>
> Subject: Text Analysis, Easing in Semantic Annotations and Linked Data
> Hello all!
> Fairly longterm subscriber (since Volume 16, in 2003), occasional poster. And naÃ¯ve humanist with a digital bent.
> Would like your insight on a crazy idea about the combination of three threads having to do with Digital Humanities.
> My main work isnât really about DH, but as an ethnographer working in digital domains (technopedagogy, most recently), been thinking about the list on several occasions. For instance, Humanist came up during interactions with Stefan Sinclair around a lab about tools to support learnersâ writing.
> Stefanâs work with Geoffrey Rockwell on Voyant Tools has been on my mind quite a bit. Used Voyant to build a coursepack in material culture and later thought about the toolsâ potential in providing feedback on learnersâ writing (for that same lab). Then noticed Michael Widnerâs work on essay revision, using Voyant Tools.
> Thatâs one thread: unexpected uses of textual analysis. Lots of cool tools are listed on TAPoR and itâs fun to explore the possibilities. Maybe thereâs a tool (or set of tools) out there which can enable my crazy idea? Weâll seeâ¦
> Fastforward a few months to my 'discovery' of Open Annotations in general and Hypothes.is http://hypothes.is in particular. Was lucky enough to benefit from interactions with Renoir Boulanger (former W3C DevOps) and Jeremy Dean (Lit Genius pioneer and now Director of Education for Hypothesis). Clearly, thereâs something going on with annotations.
> (The aforementioned Widner also created annotation tools (Lacuna Stories) and wrote a series of posts about Genius.)
> Boom. Second thread.
> Third thread is about Five Star Linked Data, which is actually closer to my work. It might be controversial in some circles, but itâs pretty neat for Libraries, Archives, Museumsâ¦ and Education (though MEAL would be a better acronym than LAME).
> See, we have this learning resource catalogue which takes part in the Semantic Web movement and conforms to Normetic, a local application profile for learning resources. (Normetic is currently switching from LOM to MLR, getting deeper into Linked Data.)
> Our platform has recently added the ability to combine learning resource metadata from multiple sources, especially useful as a way to tie those resources to diverse ontologies (ministry of educationâs competencies, Bloomâs taxonomy, accessibility standardsâ¦). Exciting stuff!
> A problem here is that indexing work can be rather difficult. In fact, the same can be said about much work for the Semantic Web in general and Linked Data in particular. It gets quite technical quite quickly. In a way, itâs as though we were at the same point with the Semantic Web as we were with the WWW before NCSA Mosaic. There are people who know a lot about SPARQL, SKOS, DBPedia, Turtle, DCMI, VDEX, etc. Maybe some of you are experts in all of these things. But itâs particularly hard to get non-experts to contribute to the Web of Data.
> Which is where the crazy idea comes in: what if we could use textual analysis to ease out semantic annotations and contribute to the Web of Data?
> Recently listened to the audiobook version of Walter Isaacsonâs The Innovators, a kind of collective biography of diverse people involved in the âdigital revolutionâ (from Ada Lovelace to Marissa Mayer).
> Through the book, couldnât help but feel that it should be converted into a Linked Data version. A lot of the content sounds like a narrative expression of RDFa. Having URIs for each entity would make the book more valuable as a resource. Sure, the same factoids about the links between these people are already available elsewhere (Ada Lovelaceâs FoaF page probably contains more useful data than Isaacsonâs book). But thereâs something to be said about following links from a text to the Web of Data.
> It might be possible to annotate Isaacsonâs book semi-automatically, tagging individual occurrences of 'Ada Lovelace', 'Lady Lovelace', 'Lord Byron's daughter', etc. Corpus tools like those created by Rockwell and Sinclair would be quite useful, here. Especially if they were combined with Open Annotations. And if these annotations generated the necessary code to be integrated in the Web of Data. Obviously, the process could then apply to Eric Raymondâs The Cathedral and the Bazaar, Lawrence Lessigâs Free Culture, and Christopher Keltyâs Two Bits. (Conveniently, these last three texts are all available in HTML.) Maybe we could throw in some Markdown (or CriticMarkup) in the mix, for good measure, as plain text annotation tends to be easier for many people than XML and other *ML.
> As a non-coder, my options are to dig through TAPoR and other repertoires for a tool which does all this and/or to send feature requests to Voyant Tools, Hypothesis, etc.
> Soâ¦ How crazy is all of this? Could we use text analysis to facilitate a type of annotation which can then contribute to Linked Data (LODLAM+education, etc.)?
> Will probably follow this up on the LOD folks.
> Quite possibly, though, this may all be related to things like the Text Encoding Initiative, Federated Wiki, ePUB3, and Wikity. If so, there are Humanist listmembers who can talk to these points.
> Thanks for any insight.
> Alex Enkerli, Learning Technology Advisor
> Vitrine technologie-éducation
>  http://dhhumanist.org/Archives/Virginia/v16/0646.html
>  http://www.vteducation.org/en/laboratories/writing-support-lab
>  http://docs.voyant-tools.org/
>  http://lessonplans.dwrl.utexas.edu/content/essay-revision-automated-textual-analysis
>  http://tapor.ca
>  http://www.lacunastories.com/
>  https://people.stanford.edu/widner/content/problems-genius-part-one-online-annotations-consensus-and-bias<https://people.stanford.edu/widner/content/problems-genius-part-three-connected-learning-lacuna-stories>
>  http://ceres.vteducation.org/app/?lang=en
>  http://www.normetic.org/
>  http://www.gtn-quebec.org/node/1004
>  https://en.wikipedia.org/wiki/ISO/IEC_19788
>  http://books.simonandschuster.com/The-Innovators/Walter-Isaacson/9781442376236
>  http://www.catb.org/esr/writings/cathedral-bazaar/
>  http://free-culture.cc/
>  http://twobits.net/
>  http://daringfireball.net/projects/markdown/
>  http://criticmarkup.com/
>  http://fed.wiki.org/welcome-visitors.html
>  http://wikity.cc/
>  http://digitalhumanities.org/lod/
More information about the Humanist