Translation
Guidelines (sent to translation agencies):
·
Chinese->English
translation guideline for the first set of data in 2001
·
Chinese->English
translation guideline for the second set of data in 2001
·
Instructions
for the 2002 Chinese->English translation evaluation data
·
Chinese->English
translation guidelines for 2003 training and eval data
·
Arabic->
English translation guidelines for 2002 evaluation data
·
Arabic->
English translation guidelines for 2003 training and eval
data
Final
Data Format (of published corpora):
·
Final
Data Format for LDC Multiple Translation Chinese Corpora
·
Final
Data Format for LDC Multiple Translation Arabic Corpora
Specifications
for human assessment of translation quality:
·
Specification
for human assessment of translation quality
Allowable training data for the separate evaluation conditions. Registered participants should contact the LDC to acquire this data:
|
Small Data Condition |
|
|
Chinese |
Arabic |
|
English Translation of Chinese Treebank |
Not Applicable |
|
The 10k-word dictionary from CMU (S.Vogel) |
|
|
Large Data Condition |
|||
|
Chinese |
Arabic |
||
|
LDC catalog # |
Title |
LDC catalog # |
Title |
|
LDC2003E14 |
FBIS data |
|
|
|
|
|
|
|
|
LDC2002E16 |
Hong Kong News Parallel Text, sentence-aligned |
LDC2003T07 |
Arabic Treebank Part 1 10k word English Translation |
|
Hong Kong News Parallel Text - |
LDC2002E15 |
UN Arabic English parallel Text |
|
|
Hong Kong Hansard Parallel Text, aligned at the document level |
LDC2002E48 |
Ummah Arabic English Parallel News Text |
|
|
LDC2002E19 |
Hong Kong Hansard Parallel Text, aligned at the sentence level |
LDC2002L49 |
Buckwalter Arabic Morphological Analyzer Version 1.0 |
|
LDC2002E17 |
English Translation of Chinese Treebank |
LDC2003T06 |
Arabic Treebank Part 1 v2.0 |
|
LDC2002E18 |
Xinhua Chinese-English Parallel News Text Version 1.0 beta 2 |
LDC2002E54 |
Multiple Translation Arabic Corpus NIST June 2002 MT evaluation data |
|
LDC2003E11 |
UN Chinese-English Parallel Text Version 1.0 beta |
LDC2003E05 |
Arabic News Translation Corpus Part 1 |
|
LDC2002L27 |
Chinese English Translation Lexicon version 3.0 |
LDC2003E09 |
Arabic News Translation Corpus Part 2 |
|
LDC2002E58 |
Sinorama Chinese-English Parallel Text |
|
|
|
LDC2002T01 |
Multiple-Translation Chinese Corpus |
|
|
|
LDC2002E53 |
Multiple Translation Chinese Corpus part 2: NIST June 2002 MT evaluation data |
|
|
|
LDC2003E01 |
Chinese-English Name Entity Lists version 1.0 beta |
|
|
|
LDC2002E04 |
Multiple Translation Chinese Corpus Part 3 |
|
|
|
LDC2001T11 |
Chinese Treebank 2.0 |
|
|
|
LDC2003E06 |
Chinese Treebank 3.0 |
|
|
|
LDC2003E07 |
Chinese Treebank English Parallel Corpus |
|
|
|
LDC2003E08 |
Chinese News Translation Corpus Part 1 |
|
|
|
Unlimited Training Condition |
|
|
Chinese |
Arabic |
|
All publicly available data up to |
All publicly available data up to |
Created:
Last updated: