Information Extraction from Hindi Texts


Kamlesh Dutta (1), Saroj Kaushik (2), Nupur Prakash (3)

(1) National Institute of Technology, Hamirpur (HP) INDIA 177005, kd@recham.ernet.in; (2) Indian Institute of Technology, New Delhi INDIA, saroj@cse.iitd.ernet.in; (3) IndiraGandhi Institute of Technology, New Delhi, INDIA nupurprakash@rediffmail.com




The paper presents an information extraction system that takes input from Hindi texts and improves the information content retrieved by using anaphor/pronoun resolution mechanism. The information extraction system developed consists of three major modules: The language Parser, Resolution System and Information Extractor. The language parser used is HPSG (Head-Driven Phrase Structure Grammar) based that provides both syntactic and semantic information to the anaphor resolution system. HPSG was chosen because it provides a set of constraint on the co-referential structures in the language, which bounds the search for an antecedent to a more precise location in the discourse. The semantic information included in its parsing may be helpful for removing ambiguity in anaphor/pronoun resolution. The anaphor resolution system uses few heuristic rules to resolve intrasentential references while centering theory is used for intersentential resolution.


information extraction, anaphor, HPSG, discourse

Language(s) Hindi
Full Paper