Crowdsourcing – Bringing together archives and their users

The international consortium of the European Digital Treasures project planned a number of events between 2020 and 2022. However, due to the pandemic situation, these events, such as the opening of transmission exhibitions, international workshops, national tenders and related camps, as well as the Crowdsourcing event, have been postponed. As the opening of the exhibitions, the organization of these community events is only due from mid 2021, in line with the COVID crisis.

The project’s Crowdsourcing activity (Activity 21) deserves special attention because it paves the way for what is unique in each of the participating countries in the project.

What is Crowdsourcing?

National Archives of Hungary, Lantos Zsuzsanna photography

It is an activity that involves community force in a professional activity, in which very meticulous tasks that require a lot of extra labour and working hours are divided into small details and distributed among many contributors. Crowdsourcing can be financial (crowd founding) or related to software testing (crowd testing). However, in the case of archives, the involvement of community force helps to monitor and, where necessary, improve the results of handwriting recognition by the Artificial Intelligence and transcription automation used as part of the European Digital Treasures project.

A basic knowledge of archival research and a basic knowledge of palaeography is essential for those involved in this community activity. The project aims to specifically motivate seniors for the activity. Crowdsourcing is a pilot program within the European Digital Treasures project that aims to involve 20 participants per partner institution in the correction work.

In preparation for the Crowdsourcing event, the archival partners selected a collection of documents from their holdings that were created in a well-defined period. These documents are highly readable and show strong research interest.

The Torre do Tombo National Archives of Portugal selected the General Register of Mercies, the National Archives of Norway chose the Oslo Register Cards written before the First World War, the National Archives of Malta selected an immigration register from 1905-1966, the Spanish State Archives picked the passport record books from the Spanish Consulate in Buenos Aires from the 1930s, and the National Archives of Hungary selected the National Census from 1828.

The records are transcribed by the Valencian TranScriptorium company’s software using machine handwriting recognition and automated text transcription.

The software is still being tested, and the archivists of the partner institutions, together with the Valencian software manufacturer, are optimizing the maximization of its efficiency, so with as little human effort as possible should be involved in order to improve the transcription results.

The selected collections, due to their extent, will still offer plenty of opportunities from the second half of the year to improve the automated transcripts of them by community work.

With this step, the archives will pave the way for a future which on one hand, brings the intersections of the digital world and the paper-based analogue world close to each other, and on the other hand, opens up a seemingly closed scientific sphere to archive users by involving them in a portion of archival work.

Written by Dorottya Szabó Senior Archivist and
Anna Palcsó Public Education Officer,
National Archives of Hungary