Title Building a List of Synonymous Words and Phrases of Japanese Compound Verbs
Authors Kyoko Kanzaki and Hitoshi Isahara
Abstract We started to construct a database of synonymous expressions of Japanese “Verb + Verb” compounds semi-automatically. Japanese is known to be rich in compound verbs consisting of two verbs joined together. However, we did not have a comprehensive Japanese compound lexicon. Recently a Japanese compound verb lexicon was constructed by the National Institute for Japanese Language and Linguistics(NINJAL)(2013-15). Though it has meanings, example sentences, syntactic patterns and actual sentences from the corpus that they possess, it has no information on relationships with another words, such as synonymous words and phrases. We automatically extracted synonymous expressions of compound verbs from corpus which is “five hundred million Japanese texts gathered from the web” produced by Kawahara (2006) by using word2vec and cosine similarity and find suitable clusters which correspond to meanings of the compound verbs by using k-means++ and PCA. The automatic extraction from corpus helps humans find not only typical synonyms but also unexpected synonymous words and phrases. Then we manually compile the list of synonymous expressions of Japanese compound verbs by assessing the result and also link it to the “Compound Verb Lexicon” published by NINJAL.
Topics Multiword Expressions & Collocations, Corpus (Creation, Annotation, Etc.), Lexicon, Lexical Database
Full paper Building a List of Synonymous Words and Phrases of Japanese Compound Verbs
Bibtex @InProceedings{KANZAKI18.532,
  author = {Kyoko Kanzaki and Hitoshi Isahara},
  title = "{Building a List of Synonymous Words and Phrases of Japanese Compound Verbs}",
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {May 7-12, 2018},
  address = {Miyazaki, Japan},
  editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {979-10-95546-00-9},
  language = {english}
