[Humanist] 28.921 Autodesk Post-Editing Data Corpus

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Tue Apr 28 08:57:56 CEST 2015

                 Humanist Discussion Group, Vol. 28, No. 921.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

        Date: Mon, 27 Apr 2015 13:04:06 +0200
        From: Венцислав
        Subject: Release of Autodesk Post-Editing Data Corpus (ISLRN 290-859-676-529-5)

Dear all,

It is my pleasure to announce the release of the Autodesk Post-Editing Data
corpus with the ISLRN 290-859-676-529-5

This resource contains parallel English source–MT/TM target segments
post-edited into several languages (Simplified and Traditional Chinese,
Czech, French, German, Hungarian, Italian, Japanese, Korean, Polish,
Brazilian Portuguese, Russian, Spanish) with between 30000 and 410000
segments per language. Its main intended use is for research in automatic
quality estimation of Machine Translation output. The provided data are
predominantly software user manual content with some segments coming from
marketing and education materials. They cover the portfolio of Autodesk
products from various domains, notably architecture, engineering, civil
engineering, simulation, computer graphics, media and entertainment. The
content was translated in the period 2012.11.12 to 2014.09.23.

The corpus is available from https://autodesk.box.com/Autodesk-PostEditing
and more information is available in the included Readme file. The data are
released under a Creative Commons Attribution-NonCommercial-NoDerivatives
4.0 International License (http://creativecommons.org/licenses/by-nc-nd/4.0/).


Dr. Ventsislav Zhechev
Computational Linguist, Certified ScrumMaster®
Platform Architecture and Technologies
Localisation Services

MAIN +41 32 723 91 22
FAX +41 32 723 93 99


Autodesk, Inc.
Rue de Puits-Godet 6
2000 Neuchâtel, Switzerland
www.autodesk.com  http://www.autodesk.com/

More information about the Humanist mailing list