Summary of the paper

Title 4FX: Light Verb Constructions in a Multilingual Parallel Corpus
Authors Anita Rácz, István Nagy T. and Veronika Vincze
Abstract In this paper, we describe 4FX, a quadrilingual (English-Spanish-German-Hungarian) parallel corpus annotated for light verb constructions. We present the annotation process, and report statistical data on the frequency of LVCs in each language. We also offer inter-annotator agreement rates and we highlight some interesting facts and tendencies on the basis of comparing multilingual data from the four corpora. According to the frequency of LVC categories and the calculated Kendall’s coefficient for the four corpora, we found that Spanish and German are very similar to each other, Hungarian is also similar to both, but German differs from all these three. The qualitative and quantitative data analysis might prove useful in theoretical linguistic research for all the four languages. Moreover, the corpus will be an excellent testbed for the development and evaluation of machine learning based methods aiming at extracting or identifying light verb constructions in these four languages.
Topics MultiWord Expressions & Collocations, Multilinguality
Full paper 4FX: Light Verb Constructions in a Multilingual Parallel Corpus
