Experiences in Collection of Handwriting Data for Online Handwriting Recognition in Indic Scripts


Ajay S Bhaskarabhatla, Sriganesh Madhvanath

Hewlett-Packard Labs, India




In this paper, we describe initial efforts at Hewlett-Packard Labs, Bangalore, to create datasets of online handwriting in Indic scripts to support research in online handwriting recognition for the Indic scripts. The term "online" here refers to the fact that handwriting is captured as a stream of (x,y) points using an appropriate pen position sensor (often called a digitizer), rather than as a bitmap (image). The paper describes the structure of Indic scripts in brief. It identifes different choices for segmenting characters into simpler shapes that can then be recognized using pattern recognition techniques. The paper discusses these issues in the context of the Tamil script. The remainder of the paper provides an overview of two distinct data collection efforts for the Tamil script - one at the isolated character level, and the other for isolated words. In the context of these efforts, we briefy describe the data collection procedure, tools for collection and subsequent annotation, user-interface issues, the annotation scheme, and the organization of the dataset. The paper concludes with the current status of the effort and future directions.


Online Handwriting, Annotation, Datasets, Tamil, Devanagari

Language(s) Indic scripts
