[Humanist] 22.703 events: noisy text; digital futures
Humanist Discussion Group
willard.mccarty at mccarty.org.uk
Sat Apr 18 11:58:24 CEST 2009
Humanist Discussion Group, Vol. 22, No. 703.
Centre for Computing in the Humanities, King's College London
Submit to: humanist at lists.digitalhumanities.org
 From: kcl - cch <cch at kcl.ac.uk> (72)
Subject: Digital Futures Training Event at King's, 27th April - 1st
 From: L V Subramaniam <lvsubram at in.ibm.com> (37)
Subject: CFP : 3rd Workshop on Analytics for Noisy Unstructured Text
Data(AND-09) [ Deadline Extended to May 4, 2009]
Date: Fri, 17 Apr 2009 11:17:08 +0100
From: kcl - cch <cch at kcl.ac.uk>
Subject: Digital Futures Training Event at King's, 27th April - 1st May 2009
DIGITAL FUTURES LONDON 2009
We are pleased to announce the Digital Futures 5 day training event:
Digital Futures Academy: from digitization to delivery
King's College London, Council Room, Strand
27th April - 1st May 2009
Only a few places left - book now to avoid disappointment!
KCL staff receive a 50% discount off the full rate.
Digital Futures is run by King's Digital Consultancy Services and the Centre
for Computing in the Humanities, King's College London working in
co-operation with Lyrasis, USA.
Led by international experts, Digital Futures focuses on the creation,
delivery and preservation of digital resources from cultural and memory
institutions. Lasting 5 days, Digital Futures is aimed at managers and other
practitioners from the library, museum, heritage and cultural sectors
looking to understand the strategic and management issues involved in
developing digital resources from digitisation to delivery.
Digital Futures will cover the following core areas:
o Planning and management
o Fund raising
o Understanding the audience
o Metadata - introduction and implementation
o Copyright and intellectual property
o Financial issues
o Visual and image based resource creation and delivery
o Implementing digital resources
o Digital preservation
The Digital Futures leaders are:
* Simon Tanner - Director of King's Digital Consultancy Services, King's
College London http://www.kdcs.kcl.ac.uk/
* Tom Clareson - Director for New Initiatives, Lyrasis
The leaders have over 30 years of experience in the digital realm between
them. Other experts will be invited to speak in their areas of expertise.
What past delegates say about Digital Futures:
* "Excellent - I would recommend DF to anyone anticipating a digitization
* "The team was exceptionally knowledgeable, friendly and personable."
* "Excellent, informative and enjoyable. Thank you."
* "Thanks, it has been an invaluable experience."
* "A really useful course and great fun too!"
King's Digital Consultancy Services
King's College London
26-29 Drury Lane
London WC2B 5RL
Tel: +44 (0)20 7848 2861
Fax: +44 (0)20 7848 2980
Email: <mailto:kdcs at kcl.ac.uk> kdcs at kcl.ac.uk
Date: Sat, 18 Apr 2009 04:59:33 +0100
From: L V Subramaniam <lvsubram at in.ibm.com>
Subject: CFP : 3rd Workshop on Analytics for Noisy Unstructured Text Data(AND-09) [ Deadline Extended to May 4, 2009]
The deadline for submission has been extended to May 4, 2009. Request you to actively participate.
3rd Workshop on Analytics for Noisy Unstructured Text Data (AND-09)
23-24 July 2009, Barcelona, Spain
in conjunction with 10th International Conference on Document Analysis and Recognition (ICDAR)
Call for Papers
Workshop Description and Objectives
Noisy unstructured text data is ubiquitous in real-world communications. Text produced by processing signals intended for human use such as printed/handwritten documents, spontaneous speech, and camera-captured images, are prime examples. ICR/OCR error rates on paper documents can range widely from 2-3% for clean inputs to 50% or higher depending on the quality of the page image, the complexity of the layout, aspects of the typography, etc. Individual variability in handwriting make this a particularly difficult form of input and error rates here are often substantially higher than for machine print text. Telephonic conversations between call center agents and customers often see 30-40% word error rates, even using state-of-the-art ASR techniques. In spite of the tremendous challenges such data presents, it is pervasive in applications of interest to corporations and government organizations.
Recognition errors are not the sole source of noise; natural language and the creative ways that humans use it can create problems for computational techniques. Electronic text from the Internet (emails, message boards, newsgroups, blogs, wikis, chat logs and Web pages), contact centers (customer complaints, emails, call transcriptions, message summaries), and mobile phones (text messages) is often noisy, containing spelling errors, abbreviations, non-standard words, false starts, repetitions, missing punctuation, missing case information, and pause-filling words such as “um” and “uh” in the case of spoken conversations.
The Third Workshop on Analytics for Noisy Unstructured Text Data (AND-09) is devoted to issues arising from the need to contend with noisy inputs, the impact noise can have on downstream applications, and the demands it places on document analysis. AND 2009 will build on two previous successful AND workshops held in 2007 (in conjunction with the 20th International Joint Conference on Artificial Intelligence) and in 2008 (in conjunction with the 31st Annual International ACM SIGIR Conference). AND 2008 proceedings are available in ACM Digital Library (http://portaltest.acm.org/toc.cfm?id=1390749&type=proceeding&coll=portal&dl=ACM). Selected papers from AND 2007 was published in a special issue of International Journal of Document Analysis and Recognition (IJDAR) and selected papers from AND 2008 will appear in IJDAR at a future date. For AND 2009 as well, selected papers will be published in IJDAR and presently we are in the process of deciding of the publisher for the proceedings.
Topics of Interest (but not limited to)
o Noise induced by document analysis techniques and its impact on downstream applications
o Formal models for noise, including characterization and classification of noise
o Treatment of noisy data in specific application areas, including historical texts, multilingual documents, blogs, chat / SMS logs, social network analysis, patent search, and machine translation
o Data sets, benchmarks, and evaluation techniques for analysis of noisy text
o All other topics arising from noise and its effects on textual data
We hope that the workshop will allow researchers working in areas related to Unstructured Data Analytics, Natural Language Processing, Information Extraction, Information Retrieval, Document Image Analysis etc., to focus on the needs of users extracting useful information from noisy text. The target audience is a mixture of academia and industry researchers working with noisy text. We believe this work is of direct relevance to domains such as call centers, the world-wide web, and government organizations that need to analyze huge amounts of noisy data.
Full papers may be submitted following the guidelines specified on the AND 2009 website: http://and2009workshop.googlepages.com/
Paper Submission: April 20, 2009, Extended to May 4, 2009
Notification of Acceptance: May 20, 2009
Camera-Ready papers due: June 20, 2009
Daniel Lopresti, Lehigh University
Shourya Roy, IBM Research, India Research Lab
Klaus U Schulz , University of Munich
L. Venkata Subramaniam, IBM Research, India Research Lab
Daniel Lopresti, lopresti at cse.lehigh.edu<mailto:lopresti at cse.lehigh.edu>
Please visit the workshop website http://and2009workshop.googlepages.com/ for information about participation and submitting papers.
More information about the Humanist