SVMTool: A general POS tagger generator based on Support Vector Machines.


Jesús Giménez, Lluís Màrquez

TALP Research Center, LSI Department. Universitat Politècnica de Catalunya. jgimenez@talp.upc.es, lluism@talp.upc.es




This paper presents the SVMTool, a simple, flexible, effective and efficient part-of-speech tagger based on Support Vector Machines. The SVMTool offers a fairly good balance among these properties which make it really practical for current NLP applications. It is very easy to use and easily configurable so as to perfectly fit the needs of a number of different applications. Results are also very competitive, achieving an accuracy of 97.16% for English on the Wall Street Journal corpus. It has been also successfully applied to Spanish exhibiting a similar performance. A first release of the SVMTool Perl prototype is now freely available for public use. A most efficient C++ version is coming very soon.


SVMTool, TnT, POS tagging, Machine Learning for NLP, Support Vector Machines.

Language(s) Results are presented for English and Spanish although it is a language independent tool.
Full Paper