TU Dortmund Department of Computer Science LS XII Pattern Recognition GroupPublications → Publication Details

Segmentation-free Query-by-String Word Spotting with Bag-of-Features HMMs


Leonard Rothacker AND Gernot A. Fink
Proc. Int. Conf. on Document Analysis and Recognition, Nancy, France, 2015.

Word spotting allows to explore document images without requiring a full transcription. In the query-by-string scenario considered in this paper, it is possible to search arbitrary keywords while only limited prior information about the documents is required. We learn context-dependent character models from a training set that is small with respect to the number of models. This is possible due to the use of Bag-of-Features HMMs that are especially suited for estimating robust models from limited training material. In contrast to most query-by-string methods we consider a fully segmentation-free decoding framework that does not require any pre-segmentation on word or line level. Experiments on the well-known George Washington benchmark demonstrate the high accuracy of our method.

 [bib] [pdf]