Title Distributional Consistency: As a General Method for Defining a Core Lexicon
Author(s) Huarui Zhang (1), Churen Huang (2), Shiwen Yu (1)

(1) Institute of Computational Linguistics, Peking University; (2) Institute of Linguistics, Academia Sinica,

Abstract We propose Distributional Consistency (DC) as a general method for defining a Core Lexicon. The property of DC is investigated theoretically and empirically, showing that it is clearly distinguishable from word frequency and range of distribution. DC is also shown to reflect intuitive interpretations, especially when its value is close to 1. Its immediate application in NLP would include defining a core lexicon in a language and identifying topical words in a document. We also categorize the existent measures of dispersion into 3 groups via ratio of norm or entropy, proposed a simplified measure and a combined kind of measure. These new measures can be used as virtual prototype or medium type for the study and comparison of existent measures in the future.
Keyword(s) Distributional Consistency, Lexical Usuality, Measure of Dispersion, Square Mean Root (SMR), Modified Frequency, Core Lexicon
Language(s) Chinese, general
