LAperLA: an integrated graphical-linguistic System for old printed Latin Texts


Andrea Bozzi (Area della Ricerca di Pisa ILC-CNR via G. Moruzzi 1, 56124 I- Pisa)


WP5: Components & Systems


LAperLA (Lettore Automatico per Libri Antichi) is a prototype for the automatic  recognition of Latin texts in old printed books. The strengths of the system are the neural architecture and the post-processing linguistic tool that is represented by an index of Latin forms (more than 500,000) and by a query management system which uses the information of the index to check and correct the interpreted words. The images have been taken from the text of "Contradicentium Medicorum" by Girolamo Cardano in the edition printed on 1663; the main textual material consists of a set of 40 image-files (11 for the training and 29 for testing) with a resolution of 118 DPI. We would like to point out that the  interpretation results produced on images chosen as benchmarks by LAperLA have been compared with Fine Reader 4.0 by Abby and Omnipage Pro 10 by Caere. FineReader reaches correctness percentage of 61.19%; Omnipage gets to 54.41%, while LAperLA recognises the 80.95% of words which increases with the aid of the specific linguistic module (93,22%). A very easy to use system interface has been developed not only for the training of the net, but also to select the parts of the image-files to be interpreted.


Graphical-Linguistic, Latin texts

Full Paper