Information Extraction from Hindi Texts
Kamlesh Dutta (1), Saroj Kaushik (2), Nupur Prakash (3)
(1) National Institute of Technology, Hamirpur (HP) INDIA 177005, email@example.com; (2) Indian Institute of Technology, New Delhi INDIA, firstname.lastname@example.org; (3) IndiraGandhi Institute of Technology, New Delhi, INDIA email@example.com
The paper presents an information extraction system that takes input from Hindi texts and improves the information content retrieved by using anaphor/pronoun resolution mechanism. The information extraction system developed consists of three major modules: The language Parser, Resolution System and Information Extractor. The language parser used is HPSG (Head-Driven Phrase Structure Grammar) based that provides both syntactic and semantic information to the anaphor resolution system. HPSG was chosen because it provides a set of constraint on the co-referential structures in the language, which bounds the search for an antecedent to a more precise location in the discourse. The semantic information included in its parsing may be helpful for removing ambiguity in anaphor/pronoun resolution. The anaphor resolution system uses few heuristic rules to resolve intrasentential references while centering theory is used for intersentential resolution.
information extraction, anaphor, HPSG, discourse