Please appologize for multiple postings
****
Second Workshop on Language
technology for Digital Historical Archives
With a special focus on Central-, (South-)Eastern Europe,
Middle East and North Africa
https://www.inf.uni-hamburg.de/inst/dmp/hercore/publications/ltdha.html
in conjunction with the 12th biennial Recent Advances in
Natural Language Processing conference (RANLP 2019), , Varna,
Bulgaria http://lml.bas.bg/ranlp2019/start.php
WORKSHOP DATE: September 5, 2019
Last Call for Papers
SUBMISSION DEADLINE EXTENSION 21 .07.2019
Motivation
During the last decades Digital Humanities evolved
dramatically, from simple database applications to complex
systems involving most recent state-of-the art in Computer
Science. Especially Language Technology plays a major role
either for processing the metadata of recorded objects or for
analyzing and interpreting content.
Applying Language Technology methods to objects from
humanities in general and historical archives in particular,
is a challenge for NLP-related research: data is heterogeneous
(image /text), often incomplete (e.g. OCR errors),
multilingual within one document (historic documents with
Latin or/and classical Greek paragraphs) and difficult to
structure (paragraphs, titles, pages are somewhat different in
historical texts).
Corpus-based methods, nowadays standard in NLP research, often
cannot be applied as the necessary large training data is
missing.
Moreover, requirements for tools in Digital Humanities,
especially tools dedicated to cultural heritage objects, are
different from the ones applied to modern texts.
Thus, performing research in Digital Humanities involves also:
adapting existent NLP tools to the historical variants of
languages; developing tools for new languages; making tools
robust to syntactic deviation; and adapting semantic
resources.
Central and Eastern Europe as well as the Middle East and
North Africa were always characterized by a high concentration
of languages and cultures, interacting with each other. On a
relatively small area texts written with at least 10 alphabets
(Arabic, Hebrew, Armenian, Georgian, Greek, Cyrillic, Geez,
Syriac and Latin, Coptic) can be found. On the other hand,
information within these texts is important beyond the borders
of a given language or script. (e.g. often documents in Ge'ez
are translations of lost Coptic or ancient Greek texts).
Places, Persons, Events have language-dependent denominations
but refer to the same individual or geographical location.
Unfortunately, especially in this area many historical
documents are in bad condition; many languages or dialects
became extinct over the time and their written evidence is
rare. Digital methods seem the perfect means for preservation
and investigation of this rich cultural heritage asset.
However, up to now, concentrated activities seem to be absent,
probably also due to the lack of adequate NLP resources and
tools. Thus, it is very necessary to evaluate existent
technology, monitor current activities, network research teams
in this area - all aims of this workshop
This is the second edition of Language technology for Digital
Humanities in Central and (South-)Eastern Europe workshop,
held in 2017 at RANLP. In the 2019 International Year of
Indigenous Languages this edition expands also to Middle East
and North Africa.
Topics
Corpora of diachronic variants and language dialects,
NLP Tools for processing historical documents,
Intelligent search in digital archives,
(Semi-) Automatic (meta)Annotation of historical texts,
Treating uncertain and vague information from historical
documents,
Ontologies for historical texts,
Evaluation of current frameworks (CLARIN, DARIAH) on
DH-objects related to historical texts;
Machine learning approaches for under-resourced DH
objects,
Methods for dealing with incomplete specified objects
(e.g. partially known features or values),
Automatic extraction of metadata,
Metadata Interoperability for digital objects
Intelligent search in digital historical archives
Geo- and Time References in historical documents
focusing on languages from the above mentioned area.
Submissions
===========
Please submit your paper through the START system at:
https://www.softconf.com/ranlp2019/LTDHA/
The reviewing process is anonymous. Double submission is
allowed, but authors will be asked to declare it at the time
of submission.
Long papers should be 8 pages long plus 2 extra pages for
references.
Short papers should be 4 pages long plus 2 extra pages for
references. Accepted short papers will be presented either as
short oral presentations or as posters.
All submissions should be formatted using the ACL based
stylesheets provided for RANLP
(http://lml.bas.bg/ranlp2019/submissions.php#styles).
Accepted papers will be published in the workshop proceedings
and uploaded on the ACL Anthology.
Important Dates:
================
Paper submission deadline (EXTENDED): July 21, 2019
Notification of acceptance: August 8, 2019
Camera-ready papers due: August 20, 2019
LT4DH-CEE Workshop: September 5, 2019
Organizing Committee
Cristina Vertan, University of Hamburg, Germany
Petya Osenova, Bulgarian Academy of Sciences, Bulgaria
Dimitar Iliev, St. Kliment Ohridski University of Sofia
Programme Committee (TBA)
Martha Yifiru Abate, University of Addis Ababa
Gabriel Bodard, Institute of Classical Studies, SAS, London
Elie Damaoui, University of Balamand
Antske Fokkens, Vrije Universiteit, Amsterdam
Walther v. Hahn, University of Hamburg
Vladislav Kubon, Charles University, Prague
Preslav Nakov, Qatar University
Maciej Ogrodniczuk, Polish Academy of Science
Gabor Proszeky, Catholic University, Budapest
Kiril Simov, Bulgarian Academy of Sciences
Stefan Trausan, Politechnics University, Bucharest
Valeria Vitale, Institute of Classical Studies, SAS, London