Title

Title	Comparative Evaluation of Collocation Extraction Metrics
Authors	Aristomenis Thanopoulos (Wire Communications Laboratory, Electrical & Computer Engineering Dept., University of Patras 265 00 Rion, Patras, Greece) Nikos Fakotakis (Wire Communications Laboratory, Electrical & Computer Engineering Dept., University of Patras 265 00 Rion, Patras, Greece) George Kokkinakis (Wire Communications Laboratory, Electrical & Computer Engineering Dept., University of Patras 265 00 Rion, Patras, Greece)
Session	EP1: Evaluation
Abstract	Corpus-based automatic extraction of collocations is typically carried out employing some statistic indicating concurrency in order to identify words that co-occur more often than expected by chance. In this paper we are concerned with some typical measures such as the t-score, Pearson’s X-square test, log-likelihood ratio, pointwise mutual information and a novel information theoretic measure, namely mutual dependency. Apart from some theoretical discussion about their correlation, we perform comparative evaluation experiments judging performance by their ability to identify lexically associated bigrams. We use two different gold standards: WordNet and lists of named-entities. Besides discovering that a frequency-biased version of mutual dependency performs the best, followed close by likelihood ratio, we point out some implications that usage of available electronic dictionaries such as the WordNet for evaluation of collocation extraction encompasses.
Keywords	Collocation extraction, Automatic evaluation, WordNet
Full Paper	128.pdf