ESMERALDA is a development toolkit for building statistical recognizers operating on sequential data as e.g. speech, handwriting, or biological sequences. The framework primarily supports continuous density Hidden Markov Models (HMMs) of different topologies, and with user-definable internal structure. Furthermore, the incorporation of Markov chain models (realized as statistical n-gram models) for long-term sequential restrictions, and Gaussian mixture models (GMMs) for general classification tasks is supported.

The goal of ESMERALDA is to put together a tractable set of conceptually simple yet powerful techniques in an integrated development environment. The system consists of a modular architecture

**ESMERALDA's system architecture**

Separate modules for estimating mixture density models (md) in conjunction with HMMs (mm) and for building n-gram models (lm) are provided. Furthermore, modules providing runtime system functionality (rs), fundamental linear algebra operations (mx), and tools for feature extraction and manipulation (fx/dsp), respectively, are the basis of the general framework. Technically, every module contains a library with an API as well as stand-alone programs for manipulating the appropriate models and associated data.

HMM-based recognizers estimated using ESMERALDA consist of elementary models specifying a certain topology, i.e. type of allowed state transitions. They are built from individual state definitions that carry all statistical model parameters. From these elementary models more complex HMMs can be constructed by using a declarative specification language. Model parameters can be initialized on labelled data and optimized by applying the standard Baum-Welch re-estimation algorithm. For finding a good balance between precision of models and robustness of parameter estimates, ESMERALDA provides the possibility of state clustering. Similar states are grouped into clusters for which individual new parameter sets are created. These new parameters can then be optimized in subsequent re-estimation steps.

Especially for large inventory recognizers possible hypothesis sequences can be restricted by statistical language models. Therefore, tools for estimating arbitrary n-gram models and methods for redistributing probability mass and smoothing distributions (in order to assign robust non-zero probability estimates to unseen n-grams) are used.

For speech processing ESMERALDA contains an integrated recognizer which processes input signals in a cascade of modules for feature extraction, statistical mixture decoding, emission probability calculation, HMM search, and finally language model search. All calculations from feature extraction to language model search are carried out strictly time-synchronously. In order to be able to produce recognition results for an utterance while the user is still speaking, i.e. the end of the input signal is not yet reached, an incremental processing strategy was developed. Additionaly the ESMERALDA recognizer is capable of applying the constraints of a context-free grammar in conjunction with a statistical language model (for details regarding the ESMERALDA system cf. the appropriate publications).

ESMERALDA has already been applied successfully to a number of challenging pattern recognition problems in the field of automatic speech recognition, offline handwriting recognition, protein sequence analysis, music analysis, and gesture recognition (for details cf. the appropriate publications).

The framework is completely written in ANSI-C. Currently, it runs on several UNIX-like operating systems including Linux in our own lab and at some of our research partners. The software is open source and can be retrieved under the terms of the LGPL here.

**References:**

Developing Pattern Recognition Systems Based on Markov Models: The ESMERALDA Framework

Fink, G. A., Plötz, T.*Pattern Recognition and Image Analysis*, 18(2), pages 207-215, June 2008. [bib] [pdf] [http]

Developing HMM-based Recognizers with ESMERALDA

Fink, G. A.

In Matousek, Václav, Mautner, Pavel, Ocelíková , Jana, Sojka, Petr (ed.), *Lecture Notes in Artificial Intelligence*, 1692, pages 229-234, Springer, Berlin Heidelberg, 1999. [bib] [pdf]

More related publications can be found here.