Title Dealing with unknown words in statistical machine translation
Authors João Silva, Luísa Coheur, Ângela Costa and Isabel Trancoso
Abstract In Statistical Machine Translation, words that were not seen during training are unknown words, that is, words that the system will not know how to translate. In this paper we contribute to this research problem by profiting from orthographic cues given by words. Thus, we report a study of the impact of word distance metrics in cognates' detection and, in addition, on the possibility of obtaining possible translations of unknown words through Logical Analogy. Our approach is tested in the translation of corpora from Portuguese to English (and vice-versa).
Topics Machine Translation, SpeechToSpeech Translation, Information Extraction, Information Retrieval, Tools, systems, applications
Full paper Dealing with unknown words in statistical machine translation
Bibtex @InProceedings{SILVA12.980,
  author = {João Silva and Luísa Coheur and Ângela Costa and Isabel Trancoso},
  title = {Dealing with unknown words in statistical machine translation},
  booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
  year = {2012},
  month = {may},
  date = {23-25},
  address = {Istanbul, Turkey},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-7-7},
  language = {english}
