An Evaluation Protocol For Text Mining Tools : ALCESTE, SAS TEXT MINER, SPAD-CRM And TEMIS Text Mining Solutions Testing
Yasmina Quatrain, Sylvaine Nugier, Anne Peradotto
ELECTRICITE DE FRANCE Research & Development
Within the context of the opening of the electricity market, EDF needs to be able to analyse large volumes of text data to enable the company to have a better knowledge of its customers. With this in mind, several text mining tools intended for analysing this very diverse information in large quantities have been evaluated using three different corpora. It appeared essential to create a table to enable easy comparison of the software. Inspired by existing expertise in data mining tools, this was carried out while being careful not to favour statistical over linguistic results. This table has ten subjects varying from the editing company to the fields of application passing through data access and lexical table analysis. In addition to the carrying out of the evaluation and its results on four market tools, this article retraces the method for creating the test table, the choice of the tools evaluated and the criteria retained. Moreover, this experience supports the use of a detailed protocol permitting indispensable functions to be identified and evaluated according to the objectives and the profile of the software user and the nature of the corpus to be analysed.
Text Mining, tools evaluation, test table, protocol