The aim of the project AMALACH (ASR- and MT-based Access to a Large Archive of Cultural Heritage) is to design and implement software tools for facilitating access into a large collection of videos, interviews with holocaust survivors. The archive, now hosted at University of Southern California, Shoah Foundation Institute, contains more than 110 thousand hours of recordings in 32 languages. About half of the interviews are held in English and Czech amounts to approximately one thousand hours.

Current access methods allow to search for keywords listed in a pre-defined dictionary (thesaurus) because snippets of the recordings were manually tagged with these keywords. The coverage of this manual labelling is however insufficient especially in the Czech part of the archive.

The project AMALACH thus aims to use advanced methods of automatic speech recognition (ASR) and machine translation (MT) to enable search in at least all the Czech and English recordings.



# Result Due Delivered Type Documentation
1 ASR for Czech 31.12.2012 31.12.2012 Software module SEASR-CZE
2 Machine translation (text) 31.12.2013 31.12.2013 Software module see package (TMODS:ENG-CZE)
3 ASR for English 31.12.2014 31.12.2014 Software module SEASR-ENG
4 Machine translation (thesaurus, queries) 30.6.2015 30.6.2015  Software module see package (TMODS:ENG-CZE)
5 Search module 30.6.2015 31.12.2015 Software module WFBAS
6 Integrated system MCLAAS 31.12.2014 31.12.2014 Software module MCLAAS
7 Integrated system deployed 31.12.2015 31.12.2015 Deployed at CVHM and ZM Praha, functional prototype Deployment documentation

Documentation to other results is part of the data package referred to from the above table.

Preliminary and partial results delivered:

  • Thesaurus (part of result #4)
  • USC-SFI MALACH Interviews and Transcripts Czech (software), delivered 16. 3. 2014, documentation

Výsledky vznikly jako součást řešení projektu Ministerstva kultury číslo DF12P01OVV022 a podléhají licenčním podmínkám daného typu projektu. Licence je všem zájemcům poskytována zdarma, avšak nezbytnou podmínkou pro využívání tohoto výsledku je, aby měl uživatel ošetřeno právo přístupu k nahrávkám, nad kterými se vyhledávání provádí, pokud tento požadavek je dle licence na jednotlivé časti systému jejich licencí vyžadován. Veškerá práva k těmto nahrávkám jsou majetkem USC Shoah Foundation. Další informace lze získat na vyžádání na


