The Czech Malach Cross-lingual Speech Retrieval Test Collection contains Czech recordings of the Visual History Archive (VHA) which consists of the interviews with the Holocaust survivors. The archive consists of audio recordings, four types of automatic transcripts, manual annotations of selected topics, interviews' metadata and partially manualy translated Thesaurus. The archive totally contains 353 recordings and 592 hours of interviews.

The audio recordings are part of the VHA. The transcripts were created within the MALACH (Multilingual Access to Large Spoken Archives) project. The 2003 transcripts were created by IBM, the 2004 and 2006 transcripts were created by The Johns Hopkins University, and the 2013 transcripts were created by the University of West Bohemia. The annotations were manually created at the Charles University in Prague and were used in the CLEF 2006 and 2007 evaluation campaigns.


The archive is freely available at Lindat Clarin repository.


See the annotations visualized as graph.


This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.


Charles University

Petra Galuščáková
Petra Hoffmannová
Pavel Pecina

University of West Bohemia

Pavel Ircing
Jan Švec