[Humanist] 24.241 JSTOR and diacritics
Humanist Discussion Group
willard.mccarty at mccarty.org.uk
Fri Aug 6 03:05:36 CEST 2010
Humanist Discussion Group, Vol. 24, No. 241.
Centre for Computing in the Humanities, King's College London
Submit to: humanist at lists.digitalhumanities.org
Date: Thu, 5 Aug 2010 09:20:52 +0000
From: Peter Batke <batke_p at hotmail.com>
Subject: JSTOR and diacritics
Thanks to Stephen for the pointers to the pdf technical details.
And thanks for pointing to other victims for a shooting. But let us not shoot pdf.
The phrase, "just shoot me" is a US southernism and is merely a expression of
frustration that the most elementary problems, it seems, cannot be resolved.
Pdf's are everywhere. The time I used to spend looking through piles of xerox
articles on my desk, I now spend looking through directories of pdf's. The pdf's
are mostly from Google and from jstor and a few others. The pdf's are generally
wonderful and I have no problem with the medium. Many of my pdf's are files
of graphic images.
The problem arise with the content. In the case of jstor, the suspicion has hardened,
in my mind at least, that there are indeed serious ocr problems that make searching
jstor a thing of probabilities. I suspect that most of jstor is just fine and that the
problems are confined to a small percentage of the files, or a small percentage of
the journal runs. It just so happens that I have blundered into a real can of worms
with German periodicals on Ottoman history. I give some more examples, at random:
e matin Bernard est all6 l'6cole. 2. Nous sommes sortis de la maison A huit .... I1 est all6 au bureau. Il est descendu de la voiture devant le bureau. ...
Heinrich Schiitz. Psalmen Davids 1619, Nr. 23-26. Hrsg. von Werner ... Praetorius, Heinrich Schiitz, and others, compiled by Burckhard ...
<article-title>Katharina Schutz Zell. Vol. 1: The Life and Thought ...
Katharina Schiitz Zell, a woman whom I used to call "wife of a reformer and ...
In the last example we must consider that - Schütz - could be found under Schutz, Schiits, as well as Schütz.
With jstor the problems are masked by the fact that everyone is happy with the spledid images. In addition, the path to a specific article goes through discipline specific bibliographies and not through searching.
However, in the new world of large scale digitization new standards and new expectations are arising. If I am looking for reviews of Babinger's "Mehmet the Conqueror" I would like to have jstor present me with all the "reviews" of the above. Generally it does extremely well. Rotten bad luck if I am working on Heinrich Schütz and several hundred other unfortunates with European diacritics in their name, unless I happen to know to add schiits to the query.
Since I have no routine access to jstor - curious policy to require institutional affiliation, especially to long-time users who no longer have one, I can not guage the full scale of the problem easily. However, while we are beating on Google for its bad ocr, perhaps some colleagues from the library could sofly knock on jstor's door and respectfully ask if someone checked all the right boxes when German and French journals were ocr'd a decade ago. cheers, Peter
More information about the Humanist