TU Dortmund Department of Computer Science LS XII Pattern Recognition GroupPublications → Publication Details

Semi-Supervised Learning for Character Recognition in Historical Archive Documents

Jan Richarz, Szilard Vajda, Rene Grzeszick and Gernot A. Fink
Pattern Recognition, 47(3), pages 1011 - 1020, 2014, Special Issue on Handwriting Recognition.

Training recognizers for handwritten characters is still a very time consuming task involving tremendous amounts of manual annotations by experts. In this paper we present semi-supervised labeling strategies that are able to considerably reduce the human effort. We propose two different methods to label and later recognize characters in collections of historical archive documents. The first one is based on clustering of different feature representations and the second one incorporates a simultaneous retrieval on different representations. Hence, both approaches are based on multi-view learning and later apply a voting procedure for reliably propagating annotations to unlabeled data. We evaluate our methods on the MNIST database of handwritten digits and introduce a realistic application in form of a database of handwritten historical weather reports. The experiments show that our method is able to significantly reduce the human effort that is required to build a character recognizer for the data collection considered while still achieving recognition rates that are close to a supervised classification experiment.

 [bib] [pdf] [http]