Multilingual Corpus-based Approach to the Resolution of English -ing


Lee Schwartz, Takako Aikawa

Microsoft Research, Natural Language Processing




Corpus data has proven to be useful for dealing with ambiguities in natural language processing (NLP). A number of studies, for example, have dealt with disambiguating English PP attachments, using corpus data. This paper explores a novel approach to resolving ambiguities associated with ing + Noun constructions in English. We use an aligned multilingual (English, Spanish, French, German and Japanese) corpus to extract lexical information necessary for disambiguation. Our premise is that while in English -ing constructions are highly ambiguous, corresponding constructions in other languages may not be ambiguous, and can thus provide English with disambiguating information. We argue that with aligned multilingual corpora, languages can learn non-trivial linguistic information from one another.


aligned multilingual corpora; Machine Translation, English -ing ambiguity

Language(s) English, Spanish, Japanese, German, French
Full Paper