[Humanist] 24.131 stemmatics data

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Sun Jun 20 11:35:20 CEST 2010


                 Humanist Discussion Group, Vol. 24, No. 131.
         Centre for Computing in the Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist at lists.digitalhumanities.org



        Date: Sun, 20 Jun 2010 16:20:46 +1000
        From: Desmond Schmidt <desmond.schmidt at qut.edu.au>
        Subject: RE: [Humanist] 24.128 new on WWW: stemmatics data
        In-Reply-To: <20100619065643.38AD65AE40 at woodward.joyent.us>


Hi Peter,

I also have a method for generating phylogenetic trees, but it works differently. What I do is run my nmerge program over the set of witnesses to produce an MVD. Then I query the MVD to produce a difference matrix. This computes the Levenshtein distance between each witness against each other witness (so, like you, no base text). The Levenshtein distance is computed at character-level granularity at a cost of 1 for a deletion, insertion or variant and at a cost of 1 for any transposition between the versions being compared, no matter what their length (so maybe 100 characters transposed for cost 1). Then I compute the standard deviation for each witness against each other witness and output this as a difference matrix. The difference matrix gets fed into Phylip to produce the phylogenetic tree. I have modified the fitch and drawtree programs of Phylip to output a JPG file that can be viewed in our web application. So one 'view' of an MVD becomes the phylogenetic tree. As you modify the text or add new witnesses it will update the tree. Although I have the tools to do all this manually I'm still working on the GUI, but it won't take much longer. Domenico will be able to demonstrate this at Pisa so you can see it. But it would be interesting to compare your method and mine on the same data, don't you think? Could you possibly make some of the original witnesses available to me in advance?

------------------------------
Dr Desmond Schmidt
Information Security Institute
Faculty of Information Technology
Queensland University of Technology
(07)3138-9509

>Subject: [Humanist] 24.128 new on WWW: stemmatics data

                 Humanist Discussion Group, Vol. 24, No. 128.
         Centre for Computing in the Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist at lists.digitalhumanities.org

        Date: Fri, 18 Jun 2010 04:20:03 +0100
        From: Peter Robinson <P.M.Robinson at BHAM.AC.UK>
        Subject: Stemmatics data: for testing phylogenetic analysis on actual manuscript traditions

As part of the Studia Stemmatalogica project (led by Tuomas Heikkilä, Teemu Roos and Petri Myllymäk of the University of Helsinki), I have prepared a page giving access to five full sets of data prepared for phylogenetic analysis: four for sections of the Canterbury Tales, one for the Old Norse Solarljod.  These datasets have been produced with exceptional care, to give the most accurate and complete portrayal of the variation in each tradition. For each dataset, we also present an expert scholarly analysis.

Our hope, in releasing this data, is to encourage researchers interested in the possibilities and challenges of the application of phylogenetic methods to stemmatics to experiment with different methods of analysis on 'real' datasets.  We would be glad to hear of any and all uses made of this data.

The data is at http://www.textualscholarship.org/newstemmatics/data/index.html.  (this address omitted in any earlier posting)

Best wishes

Peter Robinson

Institute for Textual Scholarship and Electronic Editing
Elmfield House, Selly Oak  Campus
University of Birmingham
Edgbaston B29 6LG
P.M.Robinson at bham.ac.uk<mailto:P.M.Robinson at bham.ac.uk>
p. +44 (0)121 4158441, f. +44 (0) 121 415 8376
www.itsee.bham.ac.uk http://www.itsee.bham.ac.uk/




More information about the Humanist mailing list