Title Cluster Analysis and Classification of Named Entities
Author(s) Joaquim F. Ferreira da Silva (1), Zornitsa Kozareva (2), José Gabriel Pereira Lopes (1)

(1) Departamento de Informática, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Quinta da Torre, 2725 Monte da Caparica, Portugal, jfs@di.fct.unl.pt; (2) Faculty of Mathematics and Informatics, Plovdiv University, 236, Bulgaria blvd., Plovdiv, Bulgaria, zkozareva@hotmail.com 

Session P2-W
Abstract This paper presents a statistics-based and language independent unsupervised approach for clustering possible named entities. We describe and motivate the features and statistical filters used by our clustering process. Using the Model-Based Clustering Analysis software we obtained different clusters of named entities. The method was applied to Bulgarian and English. For some clusters, precision is close to 100%; this helps human validation and saves time. Other clusters still need further refinement. Based on the obtained clusters, it is possible to classify new named entities.
Keyword(s) Named Entities, Multiword Lexical Units, Clustering
Language(s) English, Bulgarian
Full Paper 796.pdf