Raising the Bar: Stacked Conservative Error Correction Beyond Boosting
Dekai WU (1), Grace NGAI (2), Marine CARPUAT (3)
(1) HKUST Human Language Technology Center, Department of Computer Science, University of Science and Technology, Clear Water Bay, Hong Kong, email@example.com; (2) Hong Kong Polytechnic University, Department of Computing, Kowloon, Hong Kong, firstname.lastname@example.org; (3) HKUST Human Language Technology Center, Department of Computer Science, University of Science and Technology, Clear Water Bay, Hong Kong, email@example.com
We introduce a conservative error correcting model, Stacked TBL, that is designed to improve the performance of even high-performing models like boosting, with little risk of accidentally degrading performance. Stacked TBL is particularly well suited for corpus-based natural language applications involving high-dimensional feature spaces, since it leverages the characteristics of the TBL paradigm that we appropriate. We consider here the task of automatically annonating named entities in text corpora. The task does pose a number of challenges for TBL, to which there are some simple yet effective solutions. We discuss the empirical behavior of Stacked TBL, and consider evidence that despite its simplicity, more complex and time-consuming variants are not generally required.
corpus-based learning, named entity recognition, stacking, piping, transformation-based learning, corpus annotation, boosting, NTPC, error correction, stacked transformation-based learning, STBL