Summary of the paper

Title Automatic Construction of a Japanese-Chinese Dictionary via English
Authors Hiroyuki Kaji, Shin’ichi Tamamura and Dashtseren Erdenebat
Abstract This paper proposes a method of constructing a dictionary for a pair of languages from bilingual dictionaries between each of the languages and a third language. Such a method would be useful for language pairs for which wide-coverage bilingual dictionaries are not available, but it suffers from spurious translations caused by the ambiguity of intermediary third-language words. To eliminate spurious translations, the proposed method uses the monolingual corpora of the first and second languages, whose availability is not as limited as that of parallel corpora. Extracting word associations from the corpora of both languages, the method correlates the associated words of an entry word with its translation candidates. It then selects translation candidates that have the highest correlations with a certain percentage or more of the associated words. The method has the following features. It first produces a domain-adapted bilingual dictionary. Second, the resulting bilingual dictionary, which not only provides translations but also associated words supporting each translation, enables contextually based selection of translations. Preliminary experiments using the EDR Japanese-English and LDC Chinese-English dictionaries together with Mainichi Newspaper and Xinhua News Agency corpora demonstrate that the proposed method is viable. The recall and precision could be improved by optimizing the parameters.
Language Multiple languages
Topics Lexicon, lexical database, Machine Translation, SpeechToSpeech Translation, Multilinguality
Full paper Automatic Construction of a Japanese-Chinese Dictionary via English
Slides -
Bibtex @InProceedings{KAJI08.175,
  author = {Hiroyuki Kaji, Shin’ichi Tamamura and Dashtseren Erdenebat},
  title = {Automatic Construction of a Japanese-Chinese Dictionary via English},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {},
  language = {english}

Powered by ELDA © 2008 ELDA/ELRA