Tuning Context Features with Genetic Algorithms
Irena Spasić (Computer Science, University of Salford Newton Building, Manchester, M5 4WT, UK)
Goran Nenadić (Computer Science, University of Salford Newton Building, Manchester, M5 4WT, UK)
Sophia Ananiadou (Computer Science, University of Salford Newton Building, Manchester, M5 4WT, UK)
WO23: Corpus Analysis, Annotation, Representation
In this paper we present an approach to tuning of context features acquired from corpora. The approach is based on the idea of a genetic algorithm (GA). We analyse a whole population of contexts surrounding related linguistic entities in order to find a generic property characteristic of such contexts. Our goal is to tune the context properties so as not to lose any correct feature values, but also to minimise the presence of ambiguous values. The GA implements a crossover operator based on dominant and recessive genes, where a gene corresponds to a context feature. A dominant gene is the one that, when combined with another gene of the same type, is inevitably reflected in the offspring. Dominant genes denote the more suitable context features. In each iteration of the GA, the number of individuals in the population is halved, finally resulting in a single individual that contains context features tuned with respect to the information contained in the training corpus. We illustrate the general method by using a case study concerned with the identification of relationships between verbs and terms complementing them. More precisely, we tune the classes of terms that are typically selected as arguments for the considered verbs in order to acquire their semantic features.