Koninklijke Bibliotheek (National Library) KB
The mission of the KB National Library of the Netherlands is to bring people and information together, not only by offering everyone everywhere access to everything published in and about the Netherlands but also by playing a central role in the (academic) information infrastructure of the Netherlands and promoting permanent access to digital information nationally and internationally. It has developed a vast experience in the digitisation of images and text since the mid-1990s and its e-Depot, the world’s first digital archiving system for academic publications, now contains more than 15 million articles.
The KB also hosts the offices of The European Library (TEL) and Europeana. The Research and Development (R&D) Section of the KB has taken part in many EC-funded projects over the years, such as TEL and Europeana-related projects, the development of tools and services for digital preservation in the FP6 project PLANETS, research into the preservation of digital information in science in PARSE.insight (FP7) and the development of emulation software in KEEP (FP7).
The KB coordinates the project, leads the work on the technical architecture and integration and contributes to a number of strategic objectives in the area of demonstration, guidelines and building the Centre of Competence.
The British Library (National Library) BL
The British Library is the national library of the United Kingdom and one of the world's greatest libraries. The BL receives a copy of every publication produced in the UK and Ireland in a collection which includes 150 million items, in most known languages and grows by approximately three million new items every year. It has been involved in digitising items from its collection since 1993 and has developed great expertise in text and image digitisation of items created over the last 2000 years.
The Library has been involved as a partner in many projects funded at national, European and international levels. An example being The European Library (TEL), which started as a British Library led, European Commission funded project from 2000–2005, established with the express aim of giving the virtual user access to the digital and non-digital collections of the National Libraries of Europe. The Library continues to have representation on the Management Committee and steering group of TEL. Current projects include PLANETS (coordinated by the BL), EDL and UKWAC (web archiving).
The BL leads the Sub-Projects Operational Context and Capacity Building.
Österreichische Nationalbibliothek (National Library) ONB
The Austrian National Library (ONB) is the main scientific library of the Republic of Austria. In addition to its role as a deposit library, ONB acts as a research centre and has been involved in numerous national and international digital library initiatives. The library has been a partner in several projects EC-funded in FP4, FP5 and FP6 as well as in the eContent and eContentplus. Recent projects include PLANETS, DELOS, BRICKS ,EDL and TELplus. The Austrian National Library is involved in initiatives related to the i2010 Digital Libraries strategy of the European Commission. Since 2005 the library has been full member of The European Library (TEL). Within the eContentplus project EDL the Austrian National Library is leading the Work Package “Developing The European Digital Library” and is responsible for producing a digitisation roadmap for the European national libraries and for involving relevant stakeholders from the MLA community into the process of preparing the European Digital Library.
The ONB leads the Sub-Project Enhancement & Enrichment.
Universität Innsbruck (University Library) UIBK
The main expertise of the University Library, Department for Digitisation and Digital Preservation (DEA) is in project management and in the development of software applications in the fields of digital libraries and digitisation. The department has been responsible for the coordination of several R&D projects (all dealing with digital library issues) in the 4th and 5th framework programmes, as well as in the EU programmes eContent and eTEN. Central to DEA’s role in IMPACT is the EU-funded METADATA ENGINE R&D project, which focused on OCR engines and structural metadata and contributes significantly to the baseline for IMPACT.
Within WP-TR4 Experimental OCR Engines, we are joined by our colleagues from the Infmath Imaging Group in the Department of Computer Science at the University of Innbruck. The work of this group includes image segmentation/enhancing of 2D and 3D data, basic research in pattern and shape recognition and Thermoacoustic Tomography as a novel image acquisition technique. From a mathematical viewpoint these applications are strongly linked with techniques from variational calculus, partial differential equations (PDEs ) as well as differential geometry which can be considered the group’s main expertises.
UIBK is Sub Project Leader for SP Text Recognition and works on the development of Functional Extension Parser in SP Enhancement & Enrichment.
Deutsche Nationalbibliothek (National Library) DNB
The German National Library is the central archival library and national bibliographic centre for the Federal Republic of Germany. Its task, unique in Germany, is to collect, permanently archive, comprehensively document and record bibliographically without gap all German and German-language publications from 1913 on and to make them available to the public. The German National Library is the leading partner in developing and maintaining rules and standards in Germany and plays a significant role in the development of international standards.
Four projects with DNB participation are particularly relevant to IMPACT:
- Viaf: Matching and linking of the name authority files Personennamendatei (PND) and Library of Congress Name Authority File (LCNAF) to a VIAF (Virtual international Authority File).
- MACS: Unmediated linkage between subject headings providing a subject search interface in German, English and French, transcending linguistic barriers in the three comprehensive subject heading authority files: the French Rameau, US Library of Congress Subject Headings (LCSH) and German Subject Heading Authority Files (SWD).
- Crisscross: Extending the capabilities developed within MACS by linking the subject headings of the Subject Heading Authority Files (SWD) with the notations of the Dewey Decimal Classification (DDC) to provide a multilingual, user-friendly, thesaurus-based research vocabulary.
- Europeana V.1: Following the launch of the user designed and driven prototype of Europeana the EDL Foundation wishes to use Europeana 1.0 to develop an operational service and solve key operational issues related to the implementation and functioning of Europeana.
DNB leads the work on the helpdesk.
Bayerische Staatsbibliothek (State Library) BSB
The Bayerische Staatsbibliothek (Bavarian State Library; BSB) is one of the most important European general libraries and ranks among the best research libraries on an international scale. It is also the central state and repository library of the Free State of Bavaria. With almost 10 million volumes, about 55,000 current periodicals in printed or electronic form and 93,000 manuscripts, the Bayerische Staatsbibliothek is one of the most important knowledge centers of the world. It is due to Germany’s federal library system, together with Deutsche Nationalbibliothek and Staatsbibliothek zu Berlin, part of the “Virtual National Library”.
The department Munich DigitiZation Center / Digital Library (MDZ) of the Bayerische Staatsbibliothek, established in 1997 with the financial support of the German Research Foundation, has since then built up significant expertise and capacity in the digitization of different library materials from the 8th to the 21st centuries. So far, more than 100 own or cooperative projects have resulted in the production of about 355 million files (267,000 titles with 204 Terabytes of data, long term stored at the Leibniz Computing Center in Munich).
Continuing the VD 16 I project, MDZ started a highly innovative mass digitization project using scan robots (by the 2007 EU-ICT-prize winner Treventus) for an automated scanning of 16th century books (VD 16 II; approx. 7 million pages). The on-going cooperation with Google Books to digitize material free of copyright will effectuate an increase of the digital asset of a further 300 millions of pages. As the Munich DigitiZation Center aims at steadily improving availability and access to digitized material, OCR (either structured or plain text) and automated procedures of textual structure recognition are main concerns. Against this background, MDZ has participated in the European projects “eBooks on Demand” and “MICHAEL PLUS”, both funded by EU in the eTen-Programme.
BSB leads the work on the requirements forum.
Staats- und Universitätsbibliothek Goettingen (University Library) UGOE
The Goettingen State and University Library (UGOE) is one of the largest libraries in Germany. On the basis of its background and tradition as a research library the UGOE was able to build up collections of national and international rank which were continually cared for and which scarcely suffered any loss or damage during the Second World War. It is this base from which its responsibilities as state library for Lower Saxony, special-subject collection library (e.g. Pure Mathematics) and the National Library for the 18th century stem. At the same time the UGOE is one of the leading institutions in the development of the Digital Library in Germany. In spring 1997 the Centre for Retrospective Digitisation in Göttingen (Göttinger DigitalisierungsZentrum GDZ) was established to support the programme and coordinate national efforts towards standardisation in various fields (e.g. digital conversion, online access, bibliographic description). The Center in Göttingen is engaged in evaluation of tools and techniques for image capture and text conversion, bibliographic description, document management and the provision of remote access.
UGOE has 10 years of field experience in research and hands-on digitisation, is a founding member of several standardisation initiatives (e.g. METS), has experience in automated text categorisation with the SWD (German subject heading authority file) and in automated text encoding from raw OCR full-text to structured TEI (Project: Rise of Modern Constitutionalism). Since 1997, more than 5m pages spanning several centuries have been digitised by the GDZ.
UGOE leads the work on the project website.
ABBYY Production LLC (Private) ABY
ABBYY is an international company with over 600 employees in 7 worldwide offices, and includes a Research & Development department of about 300 engineers and scientists in more than 15 groups located in ABBYY Headquarters in Moscow, Russia.
The ABBYY research area is artificial intelligence (AI) which includes linguistic direction and a full spectrum of document recognition, document conversion and data capture. ABBYY is an active member of key recognition scientific conferences such as ICDAR (International Conference of Document Analysis and Recognition), DAS Conference (Document Analysis Systems) and IWHRF (International Workshop on Frontiers in Handwriting Recognition). ABBYY scientists periodically publish papers in that area.
ABBYY was part of a consortium of libraries and digitisation companies from across Europe who worked together in the METAe EC FP5 project which developed a unique OCR technology and the METAe Engine, a software package specifically designed to organise the workflow of the archiving and conversion of historical materials such as books, journals, magazines and newspapers.
ABBYY will provide the relevant OCR technology to enable research into improvements of image preprocessing, layout (or page) and analysis of documents. It will deliver OCR-XML output for the recreation of logical document structure, plus adaptive OCR and components for a full variety of different historical printing materials in European languages.
IBM Israel – Science and Technology Ltd (Private) IBM
IBM, the largest IT company in the world, includes a Research Division with 3000 employees in 8 labs around the world. The Haifa Research Lab (HRL) is the largest of the five labs outside the United States. Since it first opened as the IBM Scientific Centre in 1972, the IBM Research Lab in Haifa has conducted decades of research that have been vital to IBM's success. R&D projects are being executed today by HRL for IBM labs in the USA, Canada, Europe and the Far East, in areas such as storage systems, verification technologies, multimedia, document processing, active management, information retrieval, programming environments, optimisation technologies, and life sciences. HRL staff members are actively involved in the academic community, publishing papers in leading conferences and journals, participating in programme committees, and organising conferences and workshops.
The IBM Haifa Research Lab is uniquely positioned to handle the research challenges presented in the area of document processing. The lab has proven contributions in the following related areas: automatic scan quality control, enhancement of scanned images, page layout analysis and segmentation (distinguish text from graphics), adaptive OCR (adapts to font and vocabulary of the document being analyzed), along with automatic and manual validation of OCR results.
IBM leads the development of Adaptive OCR and Collaborative Correction.
Instituut voor Nederlandse Lexicologie (National Research) INL
The Institute for Dutch Lexicology (INL) is a research institute financially supported by the governments of the Netherlands and (Flemish) Belgium. Its mission is to document the vocabulary and grammar of present-day and historical Dutch by creating, maintaining and improving the accessibility of language resources like dictionaries, corpora, computational lexica, thesauri.
There are approximately 50 employees working at the INL, divided over five departments. The work on IMPACT will be carried out by the Language Database Department and the EDP department. The Language Database Department has a long-standing experience in building corpora and lexica. Its major research project is the Integrated Language Database of the Dutch Language from the 6th - 21st century, consisting of three major components containing corpora, dictionaries and computational lexica. The EDP department has extensive experience in the development of linguistic and lexicographical retrieval applications and linguistic processing and enrichment of language resources. The staff of EDP consists of 7 computer linguists, 3 software engineers and 3 system administrators.
INL has a central position in the management of Dutch language resources and technology. The INL has participated in the EC funded projects PAROLE, SIMPLE, TELRI, ENABLER and ELAN. The INL actively participates in the DAM-LR project on distributed access to language resources and is a member of the CLARIN common languages resource infrastructure network.
INL leads the work on lexicon structure and lexicon content.
National Centre for Scientific Research "Demokritos" (National Research) NCSR
NCSR is a self-governing research organisation, under the supervision of the Greek Government. It is the largest research organisation in Greece and is internally subdivided into 8 research institutes. The Institute of Informatics & Telecommunications (IIT) aims at playing a pivotal role in research and development (R&D), as well as in the transfer and exploitation of R&D results in relevant sectors of the Greek and international industry and society. The Informatics Department of IIT conducts research and develops infrastructure to support content filtering and extraction, adaptive intelligent systems, multimedia analysis and processing for cultural heritage applications including processing and recognition of historical documents.
Relevant previous experience includes National Projects (D-SCRIBE and POLYTIMO) on processing and recognition of historical documents and FP6 projects KT-DIGICULT-BG (digitisation of cultural and scientific heritage), BOEMIE and SHARE on indexing and retrieval of multimedia information.
NCSR coordinates WP-TR2 Segmentation and is also involved in document image pre-processing (WP-TR1 Image Enhancement), experimental OCR (WP-TR4 Experimental OCR Engines) as well as evaluation and quality assurance (WP-OC3 Evaluation Tools and Resources).
Centrum für Informations- und Sprachverarbeitung, University of Munich (University) LMU
The CIS at LMU works on various areas of natural language processing, with many international contacts both in the academic and business world. It is well-known for its contributions to the fields of electronic lexicography, search engines, text correction, information retrieval and semantic search.
The current focus of the group of Prof. Klaus U. Schulz is on adaptive techniques for improving OCR results and text correction systems, fast approximate search in dictionaries, as well as on the development and use of semantic knowledge bases for text enrichment and indexing. In all these areas, the group has published in many major international journals and conferences. Recently the CIS started a close collaboration with the historical language group of the University of Munich and with the Bavarian National Library in the field of historical documents and texts.
LMU leads the development of Language models and dictionaries.
UKOLN, University of Bath (University) UBAH
UKOLN is based at the University of Bath, UK and is jointly funded by MLA: the Museums, Libraries and Archives Council and the Joint Information Systems Committee (JISC) of the funding bodies for higher and further education in England, Scotland, Wales, and Northern Ireland. Project funding is also received from the Engineering and Physical Sciences Research Council (EPSRC), JISC and the European Commission. UKOLN also receives support from the University of Bath.
UKOLN aims to inform practice and influence policy in the areas of: digital libraries, metadata and resource discovery, distributed library and information systems, bibliographic management and web technologies. It carries out research and development work, provides network information services, including the Ariadne web magazine, and runs a variety of workshops and conferences. UKOLN carries out applied and technical research in key areas of interest to our core funders' stakeholder communities.
A number of themes underlie UKOLN project work; namely research into the development and use of emerging metadata standards, open access to data and e-print repositories, digital preservation and the management of metadata schemas. These themes, central to the development of digital libraries, also support our core-funded work programme. UKOLN is the UK Dublin Core Metadata Initiative (DCMI) Affiliate Managing Agent and we also host the JISC W3C representative (UK Web Focus).
UBAH leads the work on the learning resource toolbox and training. http://www.ukoln.ac.uk
University of Salford (University) USAL
The Pattern Recognition and Image Analysis (PRImA - www.primaresearch.org) Laboratory in the School of Computing, Science and Engineering at the University of Salford is an internationally distinguished centre specialising in research with real-world impact. PRImA was first founded at the University of Liverpool and in January 2005 moved to the University of Salford and expanded within the Informatics Research Institute (one of the highest ranked research institutes in the UK in Library and Information Management). For over 12 years, research has primarily focused in various aspects of Document Analysis and Recognition where innovations have earned PRImA significant academic standing and have found applications in Industry and other sectors.
Projects ranging from the analysis and recognition of historical documents to the analysis of web documents have been funded by public bodies and Industry. More specific to the IMPACT project is the successful completion of the FP5 MEMORIAL (IST-2001-33441) project that produced a comprehensive toolkit environment for the digitisation and recognition of World-War II typewritten documents. PRImA led the development of the Document Image Analysis tools.Research actively continues in the challenging field of the analysis and recognition of degraded historical documents. In addition, PRImA has developed (and continues to work on) methods and datasets for in-depth evaluation of Layout Analysis methods. Since 2001 PRImA has jointly organised (with the National Centre for Scientific Research of Greece – NCSR) the first and longest-standing series of international competitions in Layout Analysis.
USAL leads the work on benchmark definitions and image enhancement.
Bibliothèque nationale de France (National Library) BNF
The Bibliothèque nationale de France (BnF) is one of the largest public and research libraries in the world. Its digital library Gallica (http://gallica.bnf.fr) contains 86,000 printed and 250,000 iconographic materials, obtained through the library’s commitment to the digitisation of selected items of its collections. The current 19th Century Newspaper Digitisation Project is bringing more than 3 million pages available both in image and text mode. Moreover, 100,000 materials per year during 3 years will be added.
The experience gained through Gallica has led the BnF to develop Europeana, a mock-up, followed by a prototype of a European digital library, which has been put online on March 2007. Although developed in only five months, this prototype has provided interesting experience on full-text indexing and customised services as well as cooperative work with the national libraries of Hungary and Portugal.The BnF coordinates the International Internet Preservation Consortium which aims at sharing experiments and developments for selecting, harvesting, collecting and preserving as well as providing access to internet content now and in the future. In 2007, 25 national libraries as well as the American foundation Internet Archive are involved in this programme.
The BnF is a founding member of The European Library consortium. The BnF is also involved in the TELplus project, in which it will explore the high quality OCR, full-text indexing and subject multilingual issues, as well as in the network of excellence EDLnet.
BNF is involved in several Work Packages in Operational Context and provides demonstrators.
Bulgarian Academy of Sciences (National Research) BAS
"St. St. Cyril and Methodius" National Library (National Library) NLB
The St. St. Cyril and Methodius National Library is the National Library of Republic of Bulgaria. It gathers and processes bibliographic information, stores it and provides the Bulgarian and foreign public with printed matter and audiovisual documents created in Bulgaria. It also gathers, preserves and processes Bulgarian, Slavic and Oriental manuscripts, archival documents and rare and valuable old printed books, making them available to researchers.
As a research institution the Library carries out research in the field of library science, bibliography, book publishing, archive studies, restoration and conservation of written material. The Library is an active participant in expanding cooperation between libraries in Bulgaria and abroad and the building up of a national library system.
NLB will demonstrate and disseminate project results and support building capacity in digitisation in Bulgaria.
Jožef Stefan Institute (National Research) JSI
The Jožef Stefan Institute is the leading Slovenian research organisation. It is responsible for a broad spectrum of basic and applied research in the fields of natural sciences and technology. The mission of the Department of Knowledge Technologies is to develop advanced information technologies for the acquisition, storage, management and discovery of knowledge, especially data mining, machine learning, decision support and language technologies. The aim is to contribute superior scientific results to the global treasury of knowledge and accelerate the application of these technologies in the development of e-science and the knowledge society.
JSI (Dept. of Knowledge Technologies) currently participates in 16 EU projects, of these 6 connected to natural language processing.
JSI will demonstrate the IMPACT tools for efficient lexicon building for Slovene and develop special lexica for historical language for the improvement of OCR and information retrieval on historical document collections.
Narodna in univerzitetna knjižnica (National and University Library) NUK
The National and University Libraryof Slovenia collects, documents, preserves and archives the written cultural heritage of the Slovenian nation and country. It provides ready access to its collection of human knowledge and culture of the past and current Slovenian generations, and thus makes it available to the citizens of Slovenia and other countries. The overall digital collection consists of 500.000 digital objects, all of them accessible through the web (http://www.dlib.si/dlib_eng.asp).
NUK is planning to provide its users the possibility of full-text search through all its collections. The library expects that with the improvement of OCR, the number of manual corrections will drastically decrease and it would be possible to increase the access to the digital contents from the 19th century and before.
NUK will demonstrate and disseminate project results and support building capacity in digitisation in Slovenia.
Institute of the Czech National Corpus, Charles University Prague (University) CUP
The Institute of the Czech National Corpus (CNC) is an academic project focusing on building a large electronic corpus of mainly written Czech. The Faculty of Arts, Charles University in Prague has been in charge of the CNC, its expansion, development and other related activities, particularly those associated with teaching and advancing the field of the corpus linguistics.
The Institute will demonstrate the IMPACT tools for efficient lexicon building for Czech and develop special lexica for historical language for the improvement of OCR and information retrieval on historical document collections.
Národní knihovna Ceské republiky (National Library) NKC
ATILF - CNRS & University of Nancy (National Research) ATILF
ATILF (Analyse et Traitement Informatique de la Langue Française – Computer Processing and Analysis of the French Language), is a joint research unit of CNRS and University of Nancy. ATILF is a member of the CNRS Institut de Linguistique Française Federation, of the international consortium TEI and of the CLARIN European network. The ATILF’s scientific project is structured around four teams: Historical linguistics, Lexicon, Language learning, Macro syntax of written and oral language and around a transverse axis: Resources and normalisation.
Computerized dictionaries and encyclopedias (Trésor de la Langue Française, Dictionnaire de l’Académie française, Encyclopédie de Diderot et d’Alembert and historical dictionaries of the French language), textual databases (Frantext and tagged Frantext) and linguistic databases (Historical database of French vocabulary) represent the core of the resources distributed by the laboratory. ATILF is also the laboratory support of the National Center of textual and lexical resources of the CNRS (http://www.cnrtl.fr/).
ATILF will demonstrate the IMPACT tools for efficient lexicon building for French by developing a lexicon for French as well as working on lemmatisation of variant forms in French.
Fundación Biblioteca Virtual Miguel de Cervantes / Universidad de Alicante (Library / University) BVC / UA
The Miguel de Cervantes digital library was founded in 1999 as the result of a cooperative effort between the Universidad de Alicante and the Banco Santander. Its main goal is to disseminate Spanish and Latin American culture by digitizing and publishing over the Internet a vast collection of literary and scientific works. Today, the library is the most renowned ditization project in Spanish, contains over 50,000 digital objects, over 9000 full-text books and documents, and serves in average 345.000 pages a day. International cooperation includes participation in the MetaE (5th FP) and Multimatch (6th FP) projects.
UA will create an historical OCR lexicon for Spanish using the tools developed in IMPACT and also test the improvements in the OCR of books printed in that historical period when using the new lexicon and the tools developed in IMPACT. The Miguel de Cervantes digital library is also involved in the Centre of Competence work package.
Biblioteca Nacional de España (National Library) BNE
The National Library of Spain is the main information and document centre regarding Spanish and Latin American culture in written form, due to the status it has enjoyed as the depository body for published material in Spain since the beginning of 17th century. The library's collection is comprised of over 30,000 manuscripts, nearly 3.000 incunabula, about 500,000 samples of printed documents dating back to before 1831, over 6,000,000 modern monographs, nearly 110,000 magazine titles and a newspaper collection estimated to consist of nearly 20,000 newspapers.
Since 2005 the library has been full member of The European Library (TEL) and has also participated in many projects under the eContentplus program (EDLnet, TELplus, etc.) taking an active part as group leader for the marketing plan of the ENRICH project at a national and European level. Today the BNE continues library cooperation within other European projects as Europeana V1.0. and EuroPressNet to improve access to digital objects and the ARROW project for the building of a rights information infrastructure in Europe that will also help to find ways to clarify the rights status of orphan and out of print works so they can be cleared for digitisation and inclusion in the Hispanic Digital Library. In 2009, the BNE joined the VIAF project (The Virtual International Authority File) and the World Digital Library both as a result of library agreements.
BNE will demonstrate and disseminate project results and support building capacity in digitisation in Spain.
Poznan Supercomputing and Networking Center (National Research and Library) PSNC
Poznan Supercomputing and Networking Center (PSNC) affiliated to the Institute of Bioorganic Chemistry Polish Academy of Sciences (Instytut Chemii Bioorganicznej PAN in Polish) is a European research and development centre in new generation networking, grid computing, service architectures and digital content management. It operates National Research and Education Network called PIONIER – Polish Optical Internet – a very large and modern infrastructure for eScience in Poland. PSNC has been participating in a number of ICT projects (more than 50 in total) cooperating with the frontrunners from Europe, the USA and East Asia. Its numerous results are being exploited as the innovative solutions in the networks operated by PSNC (PIONIER mostly) or the Metropolitan Area Network in Poznan (POZMAN).
Cooperation with the public and private sector has let PSNC deploy numerous new ideas and technologies like: interactive public television, virtual laboratories and remote steering, digital libraries, enterprise grid solutions and many more.
In 1996 PSNC started research activities in the digital libraries domain. As the result of those works the definition of the requirements for the digital library software called dLibra came into being in 1999. In the middle of 2007 PSNC stated the Digital Libraries Federation to enhance the visibility and usage of the digital content from Polish digital libraries. Since 2004 the PSNC organises the “Digital libraries” workshop. In 2008 the workshop for the first time was organised together with the first “Polish Digital Libraries” conference. The full list of activities in the field of digital libraries can be found at http://dl.psnc.pl/.
PSNC will demonstrate and disseminate project results and support building capacity in digitisation in Poland. PSNC is also involved in the Centre of Competence work package.
Uniwersytet Warszawski (University) UWAR
The University of Warsaw (UWAR), established in 1816, is Poland's largest and finest university. From its beginning the University of Warsaw has played a major role in the intellectual, political and cultural life of Poland, and has been recognized throughout the world as a leading academic centre in this part of Europe.
The Department of Formal Linguistics, together with its collaborators from other departments, forms a team of experienced and very good language specialists in Polish historical texts digitisation, with access to a large corpus of tokens from historical dictionaries.
The University of Warsaw will demonstrate the IMPACT tools for efficient lexicon building for Polish and develop special lexica for historical language for the improvement of OCR and information retrieval on historical document collections.