LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title Automatic Extraction of English-Chinese Term Lexicons from Noisy Bilingual Corpora
Authors Le Sun (Open Systems & Chinese Information Processing Center, Institute of Software, Chinese Academy of Sciences, Beijing 100080, P. R. China., lesun@sonata.iscas.ac.cn)
Youbing Jin (Open Systems & Chinese Information Processing Center, Institute of Software, Chinese Academy of Sciences, Beijing 100080, P. R. China., ybjin@sonata.iscas.ac.cn)
Lin Du (Open Systems & Chinese Information Processing Center, Institute of Software, Chinese Academy of Sciences, Beijing 100080, P. R. China., ldu@sonata.iscas.ac.cn)
Yufang Sun (Open Systems & Chinese Information Processing Center, Institute of Software, Chinese Academy of Sciences, Beijing 100080, P. R. China., yfsun@sonata.iscas.ac.cn)
Keywords Bilingual Corpora Processing, Sentence Alignment, Term Extraction
Session Session WO11 - Mono-Multilingual Lexicon Acquisition and Building
Full Paper 208.ps, 208.pdf
Abstract This paper describes our system, which is designed to extract English-Chinese term lexicons from noisy complex bilingual corpora and use them as translation lexicon to check sentence alignment results. The noisy bilingual corpora are aligned firstly by our improved length based statistical approach, which could detect sentence omission and insertion partly. A term extraction system is used to obtain term translation lexicons form roughly aligned corpora. Then the statistical approach is used to align the corpora again. Finally, we filter the noisy bilingual texts and obtain nearly perfect alignment corpora.