The Luxembourg Centre for Contemporary and Digital History (C2DH) has announced a collaborative project with other institutions “Impresso: Media monitoring of the past. Mining 200 years of historical newspapers.

The aim of the project is to link digitised corpora of newspapers from Switzerland, Luxembourg, France and Germany and to develop new methods to analyse them.

Over the next three years, the C2DH at the University of Luxembourg will work with the DHLAB at the École polytechnique fédérale de Lausanne (EPFL) and the Institute for Computational Linguistics at the University of Zurich on this project. The project will receive 1.7 million Swiss francs (€1,55 million) in funding from the Swiss National Science Foundation (SNSF).

Historical newspapers represent a wealth of archival material, and many have already been digitised. However, conducting research using these sources raises various problems, including insufficient text searchability as a result of poor text recognition and missing metadata, the relative isolation of digitised newspapers within their respective archives, search functions that are difficult to use, and poorly designed user interfaces. Recent progress in text analysis has opened up new possibilities for conducting research on large collections of texts.

The project will develop “deep learning” method, a subfield of machine learning, in order to correct errors in text recognition, improving the identification of people, institutions and places, and enhancing this entity recognition using external data repositories. The C2DH will be responsible for developing a user interface that will incorporate new search functions and facilitate the critical analysis of the newspaper corpora.

To boost the relevance of the project for history, the humanities and social sciences in general, the C2DH will coordinate a series of workshops that will provide a forum for users and developers to exchange their ideas.

The project will not only lead to academic publications; at the end of the project, the individual processing, analysis and storage systems will also be made available on an open source basis for others to reuse and develop.

Associated project partners include the Luxembourg National Library, the Swiss National Library, the Swiss newspapers Le Temps and Neue Zürcher Zeitung, Swiss archives, and researchers from the University of Lausanne.

In Luxembourg, the project will be coordinated by Dr Marten Düring, Dr Lars Wieneke and Prof Dr Andreas Fickers, in coordination with Daniele Guido and Estelle Bunout.

Copyright photo: © University of Luxembourg / Michel Brumat