LREC 2016 Proceedings

Summary of the paper

Title	Punctuation Prediction for Unsegmented Transcript Based on Word Vector
Authors	Xiaoyin Che, Cheng Wang, Haojin Yang and Christoph Meinel
Abstract	In this paper we propose an approach to predict punctuation marks for unsegmented speech transcript. The approach is purely lexical, with pre-trained Word Vectors as the only input. A training model of Deep Neural Network (DNN) or Convolutional Neural Network (CNN) is applied to classify whether a punctuation mark should be inserted after the third word of a 5-words sequence and which kind of punctuation mark the inserted one should be. TED talks within IWSLT dataset are used in both training and evaluation phases. The proposed approach shows its effectiveness by achieving better result than the state-of-the-art lexical solution which works with same type of data, especially when predicting puncuation position only.
Topics	Parsing, Statistical and Machine Learning Methods, Speech Recognition/Understanding
Full paper	Punctuation Prediction for Unsegmented Transcript Based on Word Vector
Bibtex	@InProceedings{CHE16.103, author = {Xiaoyin Che and Cheng Wang and Haojin Yang and Christoph Meinel}, title = {Punctuation Prediction for Unsegmented Transcript Based on Word Vector}, booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)}, year = {2016}, month = {may}, date = {23-28}, location = {Portorož, Slovenia}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Sara Goggi and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Helene Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {978-2-9517408-9-1}, language = {english} }