Experiments in Topic Detection


Yllias CHALI (Department of Mathematics and Computer Science University of Lethbridge 4401 University Drive Lethbridge, AB T1K 3M4, Canada )


WP5: Components & Systems


Dividing documents into topically-coherent units and discovering their topic might have many uses. We present a system that proceeds in two steps: (1) the input text is segmented at places where there is a probable topic shift, (2) lexical chains are extracted from each segment as indicators of its topic. Two implementations, based on public domain resources, are presented: one based on WordNet and the second one based on Roget's thesaurus. An evaluation of the algorithm shows that lexical chains are acceptable as topic indicator with $44.5%$ of precision and $63.8%$ of recall. 


Topic detection

Full Paper