Linguistic and Computational Problems for the Creation of an Italian Children's Corpus of Spoken Language
Laura Pecchia (Istituto di Linguistica Computazionale, CNR Via Moruzzi, 1 - 56124 Pisa, Italy)
Giuseppe Cappelli (Istituto di Linguistica Computazionale, CNR Via Moruzzi, 1 - 56124 Pisa, Italy)
Elisabetta Guazzini (Istituto di Linguistica Computazionale, CNR Via Moruzzi, 1 - 56124 Pisa, Italy)
SO9: Emotional & Specific Databases
In this paper we describe the criteria adopted for the creation of a corpus of spoken language produced by children of six to eleven years of age in different communicative situations, the methodology used for the collection of data, the transcription, coding and lemmatization phases. We also give some quantitative descriptions about nouns, verbs and adjectives present in the corpus. Qualitative analyses on the adjectives are underway. This work is to be included among the activities carried out within the framework of the "Corpus di Linguaggio Infantile" (C.L.I.), a special project of the Italian National Research Council (CNR).
Children's spoken language, Corpus, Encoding systems, Lemmatization, Frequency list forms