Portuguese Large-scale Language Resources for NLP Applications


Elisabete Ranchhod (1), Paula Carvalho (1), Cristina Mota (2), Anabela Barreiro (1)

(1) Universidade de Lisboa and LabEL (CAUTL/IST), (2) LabEL (CAUTL/IST)




The paper describes Portuguese large-scale linguistic resources, mainly computational lexicons and grammars, developed by LabEL. These resources are formalized and applied to texts by means of finite-state techniques, more and more acknowledged in Natural Language Processing. On the one hand, it illustrates methods on lexical representation for simple words and multi-word expressions; on the other hand, it provides examples (in form of concordances) of linguistic structures recognized after the application of disambiguation and parsing grammars to texts. The paper ends with a short reference to the publicly available data highlighting its contribution towards dissemination of LabEL’s knowledge on language technology.


Language Resources for Language Engineering, Lexical Resources, Linguistic-based Corpus Processing, MWE – Identification and Tagging, Word Sense Disambiguation, Finite-State Parsing

Language(s) Portuguese
Full Paper