IMPACT deliverables

This page lists all external public deliverables from the IMPACT project. Additional deliverables will be added in the near future.

Functional Extension Parser


The Functional Extension Parser (FEP) has been developed by the University of Innsbruck. It is a document understanding software that is capable of recognising basic structural features of digitised documents, namely historical books.

Image enhancement toolkit


This deliverable comprises three independent software packages for manipulating scanned images in order to improve the recognition results of OCR engines. The various defects that can manifest themselves in document images are grouped into three broad categories of conditions that can be improved or eliminated in order to enhance the results obtained from scanned documents: 1) Binarisation and Colour Reduction 2) Noise and Artefacts Removal 3) Geometric Defect Correction.

Interoperability Framework


The technical and research partners in IMPACT have developed more than 20 different tools for various stages in the OCR process. The Interoperability Framework allows for a loose coupling of tools and the exchange of data between them.


Inventory extraction


The IMPACT Inventory Extraction tool is a prototype with graphical user interface (GUI) that allows for the extraction of a complete list of characters from a document, without reference to a specific language dictionary or a library of fonts. The GUI allows users to assign properties to textual features within the tool itself, or to export the inventory to an OCR engine to allow for training on particular texts, and proper full-text recognition.

Language resources - General lexica


The various language institutes in IMPACT have worked on building lexica for historical languages. The aim has been to improve OCR results for historical text, and also to ensure that the user finds historic variants of word when searching for the modern-day form. IMPACT has built lexica for nine historical languages, with an additional paper on Development and Use of Computational Lexica for OCR And IR on Historical Documents – A Cross-Language Perspective.










Language resources - Named Entities lexica


IMPACT has also built special lexica for named entities (specific names of for example places and people) in three languages (Dutch, German and English).


Lexicon building tools


IMPACT provides guidelines and general tools for lexical data development from historical source material and tools to deploy the lexicon in enrichment (i.e. for retrieval).




Pilot reports


IMPACT has tested tools in productive environments in the last half year of the project (January-June 2012). Each pilot has been described extensively in a pilot report.



Segmentation toolkit


Segmentation is a major function in an OCR system. During this step, the main document components (text / graphic areas, text lines, words and characters or glyphs) are automatically extracted. IMPACT introduces novel hierarchical segmentation models that allow the discrete problems of text block, text line, word and character segmentation to be addressed separately while at the same time allowing for interplay between all levels.

Typewritten OCR


OCR Prototype, developed by the PRImA research group of the University of Salford, for recognising typewritten documents incorporating background knowledge about the specific features of this type of documents.

Word Spotting


This tool, developed by the National Center for Scientific Research (NCSR) "Demokritos", provides an integrated GUI for indexing historical documents without an OCR engine. It allows searching the database for instances of a query keyword using three different methods.