Migrant Stories


Martin Hájek (martin.hajek@fsv.cuni.cz)
Jiří Mírovský (mirovsky@ufal.mff.cuni.cz)
Barbora Hladká (hladka@ufal.mff.cuni.cz)


Migrant Stories is a corpus of 1017 short biographic narratives of migrants originally published on https://iamamigrant.org/stories/. For the original site, the narratives had been adapted by people or organizations submitting the particular story and eventually selected for publication by The International Organization for Migration (IOM, the UN organization providing help for migrants). It is a very heterogeneous sample of migrants' stories and cannot be taken as representative or unbiased sample of migrant experiences over the world.

In the Migrant Stories corpus, the narratives have been supplemented with meta information about countries of origin/destination, the migrant gender, GDP per capita of the respective countries etc., see below for details.

The Migrant Stories corpus was compiled for students in the course NPFL134 (Data Analytics for Students of Social Studies and Humanities) at the Institute of Formal and Applied Linguistics in the summer semester of 2022, as a teaching material for data analysis.

The corpus was published in October 2022 at LINDAT/CLARIAH-CZ repository (http://hdl.handle.net/11234/1-4818) under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0) licence.

Data Format

The data are distributed in a single TSV (tab-separated values) file. Each story is represented by a single row containing the following fields (columns):

  • id_story (a numerical id from 1 to 1017)
  • name (the name of the migrant)
  • country_or (the country of origin)
  • country_de (the destination country)
  • conti_or (the original (part of) continent; A for Africa, E for Europe, I for Asia, LA for Latin America, M for Middle East, NA for North America, O for other)
  • conti_de (the destination (part of ) continent)
  • distance (classification of the distance from the origin to the destination into two classes: close, far
  • country_or_gdp (GDP per capita of the original country)
  • country_de_gdp (GDP per capita of the destination country)
  • gdp_change (degree of the GDP change in three classes: E (aprox. equal), L (low), H (high)
  • home_change (im for migrants who stayed in the destination country, hc for migrants who in the end returned to their country of origin)
  • gender (female, male, n (for stories about multiple persons), unisex (unknown gender in stories in the first person)
  • story (the text of the story)

Please note:

  •  Gender of narrators was semiautomatically identified (gendernamefinder.com + manual search).
  •  GDP per capita for countries was downloaded from UN statistics.


The Migrant Stories corpus is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0) licence.


The work on the corpus was financed by the by the 4EU+ Alliance under grant agreement No 2021_F3_10.