Jump label

Service navigation

Main navigation

You are here:

Main content

Word Spotting Tutorial

ICDAR 2017 Tutorial: Word Spotting - From Bag-of-Features to Deep Learning


Research in building automatic reading systems has made considerable progress since its first inception in the 1960's. Today, quite mature techniques are available for the automatic recognition of machine-printed text. However, the automatic reading of handwriting is a considerably more challanging task, especially when it comes to historical manuscripts. When current methods for handwriting recognition reach their limits, approaches for so-called word spotting come into play. These can be considered as specialized versions of image retrieval techniques. The most successful methods rely on machine learning methods in order to derive powerful models for representing queries for handwriting retrieval.

This tutorial will be organized in two parts: After an introduction to the problem of word spotting and a brief look at the methodological development in the field, the first part will cover classical approaches for learning word spotting models. These all build on Bag-of-Features (BoF) representations that were developed in the field of computer vision for being able to learn characteristic representation for image content in an unsupervised manner. It will be shown how word spotting models can be built applying the BoF principle. It will also be described, how basic BoF models can be extended by incorporating statistical sequence models and, more importantly, by learning common sub-space representations between different modalities.

In the second part of the tutorial, advanced models for word spotting will be presented that apply techniques of deep learning and, currently, define the state-of-the-art in the field. After a discussion of pros and cons of the classical approaches, first foundations of neural networks in general and deep architectures in particular will be laid. The success of such deep networks largely became possible because only recently solutions to the crucial problem of vanishing gradients were proposed. Combining the idea of common sub-space representations and the application of a unified framework that can be learned in an end-to-end fashion unprecedented performance on a number of challenging word spotting tasks can be achieved, as has been demonstrated by the PHOCNet.

Tutorial Slides

The slides for the tutorial can be found here.

Supplementary Material

In addition to the slides, we offer accompanying exercises for the tutorial in Python which can be done at home. The exercises are all contained in a Virtual Box File which comes with all necessary libraries pre-installed. This way, you don't have to install all libraries manually. The Readme gives a detailed explanation how to setup and run the Virtual Box. Also, we give sample Solutions for the exercises.

Sub content


Prof. Dr.-Ing. Gernot A. Fink
Head of Research Group
Tel.: 0231 755-6151