[Humanist] 28.168 text for text mining?

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Tue Jul 1 22:36:25 CEST 2014

                 Humanist Discussion Group, Vol. 28, No. 168.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

        Date: Tue, 01 Jul 2014 14:14:34 -0500
        From: "Drew VandeCreek" <drew at niu.edu>
        Subject: text mining

I am a historian trying to figure out how to do text mining. In this case I am working with nineteenth-century American newspapers. 
I recently contacted a library that makes a Civil War-era newspaper available in searchable format for use on (brick and mortar) site, and asked them for permission to work with materials from 1861-1865. 
After we negotiated a brief agreement setting out terms of use, they sent me the files. The problem is that they sent me a TIF-format image for every page. I had asked for the text-format versions of the files.
I am now making sure that I can be clear about what I am requesting when I follow up with them. 
It is my understanding that if a textual resource is to be searched in any effective sense, the software must work with the material in a text format. 
Thus, if the lending library presents searchable textual materials, they must have a text-format file on hand. 
Should I move forward with this assumption?
Please advise. 
Drew E. VandeCreek
Director of Digital Initiatives 

University Libraries
Northern Illinois University
DeKalb, IL 60115
(815) 753-7179

More information about the Humanist mailing list