Collecting Spontaneously Spoken Queries for Information Retrieval


Tomoyosi Akiba (1), Atsushi Fujii (2), Katunobu Itou (3)

(1) National Institute of Advanced Industrial Science and Technology; (2) University of Tsukuba; (3) Nagoya University




Motivated to realize the speech-driven information retrieval systems that accept spontaneously spoken queries, we developed a method to collect such speech data derived from the pre-defined search topics that had been systematically constructed for IR research. In order to evaluate both our method and the performance of the document retrieval by using the spontaneously spoken queries, we took place two experiments of collecting the speech data by our method using publicly available test collections of evaluating document retrieval. The first preliminary experiment took place with relatively small number of search topics selected from the NTCIR-3 Web retrieval collection, which had been constructed for the TREC-style evaluation workshop, in order to test our method. The second experiment took place with all of the search topics released from the NTCIR-4 Web task to participate the formal run of the evaluation. The information about the collected data and the result of the evaluation with respect to both the speech recognition accuracy and the precision of document retrieval by using the collected data are presented in this paper.


spontaneous speech, Information Retrieval, speech data collection, large vocabulary continuous speech recognition, TREC-style evaluation

Language(s) Japanese
Full Paper