LREC 2018 Proceedings

Summary of the paper

Title	Massively Translingual Compound Analysis and Translation Discovery
Authors	Winston Wu and David Yarowsky
Abstract	Word formation via compounding is a very widely observed but quite diverse phenomenon across the world's languages, but the compositional semantics of a compound are often productively correlated between even distant languages. Using only freely available bilingual dictionaries and no annotated training data, we derive novel models for analyzing compound words and effectively generate novel foreign-language translations of English concepts using these models. In addition, we release a massively multilingual dataset of compound words along with their decompositions, covering over 21,000 instances in 329 languages, a previously unprecedented scale which should both productively support machine translation (especially in low resource languages) and also facilitate researchers in their further analysis and modeling of compounds and compound processes across the world's languages.
Topics	Other, Multilinguality, Lexicon, Lexical Database
Full paper	Massively Translingual Compound Analysis and Translation Discovery
Bibtex	@InProceedings{WU18.1029, author = {Winston Wu and David Yarowsky}, title = "{Massively Translingual Compound Analysis and Translation Discovery}", booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {May 7-12, 2018}, address = {Miyazaki, Japan}, editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga}, publisher = {European Language Resources Association (ELRA)}, isbn = {979-10-95546-00-9}, language = {english} }