[Humanist] 26.603 text-comparison software

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Wed Dec 19 07:55:29 CET 2012


                 Humanist Discussion Group, Vol. 26, No. 603.
            Department of Digital Humanities, King's College London
                              www.dhhumanist.org/
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    Daniel Allington <daniel.allington at open.ac.uk>            (45)
        Subject: Re:  26.597 text-comparison software

  [2]   From:    Trevor Borg <trevor.borg at gmail.com>                       (35)
        Subject: Re:  26.597 text-comparison software


--[1]------------------------------------------------------------------------
        Date: Tue, 18 Dec 2012 11:35:11 +0000
        From: Daniel Allington <daniel.allington at open.ac.uk>
        Subject: Re:  26.597 text-comparison software
        In-Reply-To: <20121218064727.761782E00 at digitalhumanities.org>

Dear Willard

Diff is the obvious starting point on a Unix-like system, but its smallest unit of comparison is the line, which is not ideal for natural language data. I haven't tried Kaleidoscope.

A few other options:

1. Microsoft Word's 'merge documents' feature is a surprisingly powerful way of comparing texts, and I know of at least one textual scholar who uses it as his primary tool for witness collation. OpenOffice has a similar feature though I'm not aware of its having been tested to the same extent.

2. Juxta (http://www.juxtasoftware.org/about/) provides visualisations of differences between texts if visualisations are what you're after. It's cross-platform and open source. I've found it to be most useful when dealing with documents that are more similar than they are different (unsurprisingly, since it was designed for witness collation). If you're dealing with a situation where material from one text is distributed randomly throughout another, the visualisations are less easy to read (or at least, they were with the version I'm using; I know the application has been updated since I installed it though I doubt this issue was a priority given its primary purpose).

3. Medite (http://www-poleia.lip6.fr/~ganascia/Medite_Project) is extremely powerful, but it is (or used to be) Windows only, so I haven't used it for a while. It also lacks Juxta's visualisations, though I must admit I'm not that big on visualisations.

There's a paper here describing Medite and comparing its performance to that of other tools (including Word but not including Juxta, which wasn't available at the time):

   http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.72.4202

Best

Daniel



--[2]------------------------------------------------------------------------
        Date: Tue, 18 Dec 2012 11:58:34 -0600
        From: Trevor Borg <trevor.borg at gmail.com>
        Subject: Re:  26.597 text-comparison software
        In-Reply-To: <20121218064727.761782E00 at digitalhumanities.org>


There's been some work on this problem at at the Center for Textual Studies and DH @ LU Chicago. See some demos here (https://sites.google.com/a/ctsdh.luc.edu/hrit-intranet/demos). Any of the top ~8 or so, marked `Experimental Image`, might do something like you are looking for.

Trevor



More information about the Humanist mailing list