[Humanist] 29.202 pubs: open data, open Greek, jazz & comics

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Sun Aug 9 08:40:42 CEST 2015

                 Humanist Discussion Group, Vol. 29, No. 202.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

  [1]   From:    "Priego, Ernesto" <Ernesto.Priego.1 at city.ac.uk>           (33)
        Subject: CFP: Brilliant Corners: Approaches to Jazz and Comics

  [2]   From:    Samuel Moore <samuel.moore15 at gmail.com>                   (19)
        Subject: CFP: New Journal of Open Humanities Data

  [3]   From:    Gregory Crane <gregory.crane at tufts.edu>                  (122)
        Subject: Open Patrologia Graeca 1.0

        Date: Fri, 31 Jul 2015 14:27:12 +0000
        From: "Priego, Ernesto" <Ernesto.Priego.1 at city.ac.uk>
        Subject: CFP: Brilliant Corners: Approaches to Jazz and Comics

Dear All,

With apologies for cross posting, I include below the CFP for an exciting new special collection on The Comics Grid, http://comicsgrid.com/  Brilliant Corners: Approaches to Jazz and Comics.

Submissions from the digital humanities community will be particularly welcome.

This CFP can also be found on our blog here - http://blog.comicsgrid.com/2015/07/cfp-jazz-and-comics/.

Many thanks,


The Comics Grid: Journal of Comics Scholarship http://comicsgrid.com/  invites authors and artists to submit contributions for a special collection on the general topic of Jazz and Comics.

This will be an open access scholarly collection co-edited by Dr Nicolas Pillai (Birmingham City University) and Dr Ernesto Priego (City University London).

We welcome submissions from researchers, artists, graduate students, scholars, teachers, curators, publishers and librarians from any academic, disciplinary or creative background interested in the multidisciplinarystudy and/or practice of comics and jazz.

Submissions must fulfil The Comics Grid’s editorial guidelines, available here http://comicsgrid.com/about/submissions . The Comics Grid: Journal of Comics Scholarship is an open access journal; authors retain copyright of their own work and the published content is made available on HTML and PDF under a Creative Commons-Attribution License.

The popular forms of jazz and comics have shared similar historical and cultural tendencies. As expressions of modernism, they have been subject to the demands of the marketplace and consumed by wide and varied audiences. Yet the liberatory qualities of comics and jazz have provoked concern in moral guardians, particularly in relation to the subcultures they have generated. Recalling Bourdieu, we might note that, within these subcultures, very divergent and often incompatible judgements are fiercely defended (1983: 24). In the 21st century, both jazz and comics are accepted as art forms. However, this elevated cultural position has arguably come at a price, contributing to the restriction of some forms of jazz and comics to specialised spaces of purchase and consumption.

Over the last forty years, the fields of jazz studies and comic studies have gained currency within the academy and have been enriched by interdisciplinary approaches. The New Jazz Studies has invigorated the discipline beyond its musicological roots, while Comics Studies has thrived in the digital age. This collection aims to find meeting points between the disciplines. We are encouraged by the fact that distinguished jazz musicians such as Wayne Shorter, Sonny Rollins, Herbie Hancock and Vince Guaraldi have each stated the influence of comic books on their musical development, while artists and writers have frequently turned to jazz for inspiration (e.g. strips about music appreciation by Harvey Pekar or Blutch). Jazz musicians have been the subjects of comic strips (e.g. Charlie Parker: Handyman, the BD Jazz series) and jazz musicians have created comic strips (Wally Fawkes/Trog).

The Comics Grid: Journal of Comics Scholarship welcomes research articles, book reviews, research notes, interviews, commentaries and research in comics form that develop the existing scholarship on jazz and comics as cultural and artistic practices within specific contexts and specific material conditions. We are particularly interested in work which emphasises interconnection and the multimodal. We proceed from an assumption that comics are not silent and that jazz is inherently visual.

Potential contributors are encouraged to think about jazz and comics expansively—and to consider them as practices that resist rigid formal definitions. While this will primarily be an academic collection of essays, we welcome work that challenges traditional forms of academic writing that nonetheless follow rigorous academic practice. Submissions might, for example, present academic book reviews in comics form, or research-based interviews with practitioners or scholars.

Possible topics may include (but are not restricted to):

  *   The role of materiality and/or performativity in comics and jazz cultures
  *   Comics and jazz collections in libraries and archives, and what comics and jazz librarianship and curatorial practice might learn from each other
  *   Representations of jazz musicians and jazz history in comics
  *   Visual and literary representations of jazz music in comics
  *   Collectionism in comics and jazz cultures
  *   The role of jazz music in films about comics and comics artists
  *   Gender and jazz in comics
  *   Critical engagements with biographies of jazz musicians in comics form

Submissions can be in any of the article types listed in our author guidelines http://comicsgrid.com/about/submissions/ . It is essential all research submissions include and directly refer to and discuss, in-text, specific examples of comics (panels, pages). Please ensure you have read the author guidelines carefully before submitting. Submissions must be uploaded directly to the journal here http://comicsgrid.com/about/submissions/ . All research submissions are subject to peer review. For technical specifications and special guidelines for research presented in comics form, please contact<http://comicsgrid.com/contact/> the editors before submitting.

Important dates

  *   Submission deadline: 15 January 2016
  *   Estimated Acceptance/Rejection Notices date: 15 April 2016
  *   Estimated author revisions and proofreading period: 15 April- 15 June 2016
  *   Estimated Publication date: 15 July 2016

*Depending on the number of accepted submissions outputs may be published in the order they are accepted.

Dr Ernesto Priego
Lecturer in Library Science, Centre for Information Science #citylis<http://www.city.ac.uk/department-library-information-science/information-studies-scheme>
#citylis news<https://blogs.city.ac.uk/citylis/>

        Date: Wed, 5 Aug 2015 17:32:49 +0100
        From: Samuel Moore <samuel.moore15 at gmail.com>
        Subject: CFP: New Journal of Open Humanities Data

Hi all,

I'd like to share the following call for papers for the forthcoming
open-access *Journal of Open Humanities Data* from Ubiquity Press. The
journal features peer reviewed publications describing humanities data or
techniques with high potential for reuse. Humanities subjects of interest
to JOHD include, but are not limited to Art History, History, Linguistics,
Literature, Music, Philosophy, Religious Studies, etc. Data that crosses
one or more of these traditional disciplines are highly encouraged.

The full call for papers is available on the journal website:

Please do let me know if you have any questions at all.

All the best,



Samuel Moore
PhD Student, Department of Digital Humanities, King's College London

Twitter: @samoore_  http://www.twitter.com/samoore_ 

        Date: Fri, 7 Aug 2015 11:45:12 -0400
        From: Gregory Crane <gregory.crane at tufts.edu>
        Subject: Open Patrologia Graeca 1.0

  Open Patrologia Graeca 1.0

August 8, 2015


Comments to munson at dh.uni-leipzig.de <mailto:munson at dh.uni-leipzig.de>

Federico Boschetti, CNR, Pisa
Gregory Crane, Leipzig/Tufts
Matt Munson, Leipzig/Tufts
Bruce Robertson, Mount Allison
Nick White, Durham (UK) (and Tufts during 2014)

A first stab at producing OCR-generated Greek and Latin for the complete 
Patrologia Graeca  (PG) is now available on GitHub at 
https://github.com/OGL-PatrologiaGraecaDev. This release provides raw 
textual data that will be of service to those with programming expertise 
and to developers with an interest in Ancient Greek and Latin. The 
Patrologia Graeca has as much as 50 million words of Ancient Greek 
produced over more than 1,000 years, along with an even larger amount of 
scholarship and accompanying translations in Latin.

Matt Munson started a new organization for this data because it is 
simply too large to put into
the existing OGL organization.  Each volume can contain 250MB or more of 
.txt and .hocr files, so it is impossible to put everything in one 
  repository or even several dozen repositories. So he decided to create 
  a new organization where all the OCR results for each volume would be 
  contained within its own repository.  This will also allow us to add 
more OCR data as necessary (e.g., from Bruce Robertson, of Mt. Allison 
University, or from nidaba, our own OCR pipeline) at the volume level.

The repositories are being created and populated automatically by a 
  Python script, so if you notice any problems or strange happenings, 
  please let us know either by opening an issue on the individual volume 
repository or by sending us an email.  This is our first attempt at pushing
this data out.  Please let us know what you think.

Available data includes:

  * Greek and Latin text generated by two open source OCR engines,
    OCRopus (https://github.com/tmbdev/ocropy)  and Tesseract
    (https://github.com/tesseract-ocr). For work done optimizing
    OCRopus, see http://heml.mta.ca/lace. For work done optimizing
    Tesseract, see http://ancientgreekocr.org/. The output format for
    both engines in hOCR (https://en.wikipedia.org/wiki/HOCR), a format
    that contains links to the coordinates on the original page image
    from which the OCR was generated.

  * OCR results for as many scans of each volume of the Patrologia
    Graeca that we could find in the HathiTrust. We discovered that the
    same OCR engine applied to scans of different copies of the same
    book would generate different errors (even when the scans seemed
    identical to most human observers). This means that if OCR applied
    to copy X incorrectly analyzed a particular word, there was a good
    chance that the same word would be correctly analyzed when the OCR
    engine was applied to copy Y. A preliminary study of this phenomenon
    is available here
    In most cases, the OCRopus/Lace OCR
    contains results for four different scanned copies while the
    Tesseract/AncientGreekOCR output contains results for up to 10
    different copies. All of the Patrologia Graeca volumes are old
    enough that HathiTrust members in Europe and North America can
    download the PDFs for further analysis. Anyone should be able to see
    the individual pages used for OCR via the public HathiTrust interface.

  * Initial page-level metadata for the various authors and works in the
    PG, derived from the core index at columns 13-114 ofCavallera’s 1912
    index to the PG
     http://archive.org/details/PatrologiCursusCompletusAccuranteJ.-p.MigneseriesGrcaIndices (which
    Roger Pearse cites
    a thttp://www.roger-pearse.com/weblog/patrologia-graeca-pg-pdfs/). A
    working TEI XML transcription, which has begun capturing the data
    within the print source, is available for inspection
    at:https://www.dropbox.com/s/mldhu4okpq4i7r8/pg_index2.xml. All
    figures are preliminary and subject to modification (that is one
    motivation for posting this call for help), but we do not expect
    that the figures will change much at this point. At present, we have
    identified 658 authors and 4,287 works. The PG contains extensive
    introductions, essays, indices etc. and we have tried to separate
    these out by scanning for keywords (e.g., praefatio, monitum,
    notitia, index). We estimate that there are 204,129 columns of
    source text and 21,369 columns of secondary sources, representing
    roughly 90% and 10% respectively. Since a column in Migne contains
    about 500 words and since the Greek texts (almost) always have
    accompanying Latin translations, the PG contains up to 50 million
    words of Greek text but many authors have extensive Latin notes and
    in some cases no Greek text, so there should be even more Latin. For
    more information, look here

Next Steps

 1. Developing high-recall searching by combining the results for each
    scanned page of the PG. This entails several steps. First, we need
    to align the OCR pages with each other -- page 611 for one volume
    may correspond may correspond to page 605 in another, depending upon
    how the front matter is treated and upon pages that one scan may
    have missed. Second, we need to create an index of all forms in the
    OCR-generated text available for each page in each PG volume. Since
    one of the two OCR engines applied to multiple scans of the same
    page is likely to produce a correct transcription, a unified index
    for all the text for all the scans of a page will capture a very
    high percentage of the words on that page.

 2. Running various forms of text mining and analysis over the PG. Many
    text mining and analysis techniques work by counting frequently
    repeated features. Such techniques can be relatively insensitive to
    error rates in the OCR (i.e., you get essentially the same results
    if your texts is 96% accurate or if your texts are 99.99% accurate).
    Many methods for topic modelling and stylistic analysis should
    produce immediately useful results.

 3. Using the multiple scans to identify and correct errors and to
    create a single optimized transcription. In most case, bad OCR
    produces nonsense forms that are not legal Greek or Latin. When one
    OCR run has a valid Greek or Latin word and others do not, that
    valid word is usually correct. Where two different scans produce
    valid Greek or Latin words (e.g., the common confusion of eumand
    cum), we can use the hOCR feature that allows us to include multiple
    possibilities. We can do quite a bit encoding the confidence that we
    have in the accuracy of each transcribed word.

 4. Providing a public error correction interface. One error correction
    interface already does exist and has been used to correct millions
    of words of OCR-generated Greek but two issues face us. First, we
    need to address the fact that we cannot ourselves serve page images
    from HathiTrust scans. HathiTrust members could use the system that
    we have by downloading the scans of the relevant volumes to their
    own servers but that does not provide a general solution. Second,
    our correction environment deals with OCR for one particular scanned
    copy. Ideally, the correction environment would allow readers to
    draw upon the various different scans from different copies and
    different OCR engines.

More information about the Humanist mailing list