[Humanist] 23.207 events: automating genre classification

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Fri Jul 31 07:09:43 CEST 2009

                 Humanist Discussion Group, Vol. 23, No. 207.
         Centre for Computing in the Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

        Date: Thu, 30 Jul 2009 23:15:56 +0300 (EEST)
        From: dobreva at math.bas.bg
        Subject: workshop on Automated Document Genre Classification
        In-Reply-To: <20090728060603.EAD5332A32 at woodward.joyent.us>

DCC and Robert Gordon Joint Workshop:
Automated Document Genre Classification - Supporting Digital Curation,
Information Retrieval, and Knowledge Extraction
9 September 2009
Microsoft Research, Cambridge, United Kingdom

In co-operation with the International Conference on the Theory of
Information Retrieval (ICTIR) and Microsoft Research, Cambridge, UK, the
Digital Curation Centre (DCC) and Robert Gordon University are holding a one
day workshop on Automated Document Genre Classification. This workshop is
intended as a brainstorming session for building a research agenda for
automated genre classification, identification and recognition that will
enhance and support work flows within:

-Digital curation and preservation
-Information management
-Information seeking, search, and retrieval
-Information extraction and knowledge discovery

There is a lack of consensus in the genre classification research community
on methods of genre taxonomy generation, evaluation, and applications of the
study in existing systems. This event is intended to open up a discussion
forum and identify:

-How to constructively establish a useful genre taxonomy
-How to integrate and apply genre classification within existing information

-How to evaluate and consolidate its usefulness and effectiveness within
 these target systems.

This workshop will bring together core people within genre classification
research and the areas of research mentioned above to establish a research
road map for bringing genre classification research to applicable maturity.

The automation of metadata extraction is crucial to digital curation
activities, as information deluge is likely to result in enormous costs in
manual extraction. The organisation of documents into their genre classes
that indicate the physical and conceptual structure of the text, could serve
as a starting point for both automatic and manual extraction by narrowing
down the possible areas within the text from which to extract the required

Collection profiling is an important aspect of risk assessment and data
audit within organisational collections. Each organisation focuses on
document genres strongly associated to the activities and services central
to the organisation: e.g. a research article as a part of experimental
research at a research centre; a report as part of a news coverage in a
newspaper corporation; a financial budget report as part of a business
venture in a company. The identification of core document genres could form
building blocks for defining criteria for identifying risks to the
collection that are cognizant of procedural context of the organisation.

Information retrieval techniques mostly rely on relevance measures
calculated on the basis of the document's topical content. However, a
document with the same topic may be created with different objectives and as
part of different processes (e.g. research as opposed to product promotion)
resulting in different levels of relevance, depth, usefulness, and
reliability as a source of information. Genre classification (i.e.
distinguishing an advertisement about a camera from a product review of the
same camera) may be an effective method of supporting finer levels of
granularity in relevance judgements.

Tentative Programme
The workshop will consist of four sessions. The first three sessions will
comprise three presentations each from selected speakers, followed by
discussion. The fourth session will take the format of open discussion.

09:00 – 09:30 Registration

09:30 – 11:00 Session I: Understanding genre classification — building a

11:00 – 11:15 Coffee

11:15 – 12:45 Session II: Role of genre classification in existing
information systems

12:45 – 14:00 Lunch

14:00 – 15:30 Session III: Viability of evaluating the effectiveness and
usefulness of genre classification

15:30 – 15:45 Coffee

15:45 – 16:45 Session IV: Building a research road map — open discussion and
summary of previous sessions

16:45 – 17:00 Close

This event will cost £75.00.

Registration is available at

Best regards,
Joy Davidson
DCC Training Coordinator and ERPANET British Editor
Humanities Advanced Technology and Information Institute (HATII)
George Service House, 11 University Gardens,
University of Glasgow
Glasgow G12 8QJ
Tel: +44(0)141 330 8592
Fax: +44(0)141 330 3788
british.editor at erpanet.org

More information about the Humanist mailing list