Hausa Visual Genome

Hausa Visual Genome is a multimodal dataset consisting of text and images suitable for English-to-Hausa multimodal machine translation tasks and multimodal research. We have selected short English segments (captions) from Visual Genome along with associated images and automatically translated them to Hausa with manual post-editing, taking the associated images into account. The training set contains 29K segments. Further 1K and 1.6K segments are provided in development and test sets, respectively, which follow the same (random) sampling from the original Hindi Visual Genome.


Download Link


Hausa Visual Genome 1.0

How to cite

If you use this corpus, please cite the following paper:

  title={Hausa visual genome: A dataset for multi-modal English to Hausa machine translation},
  author={Abdulmumin, Idris and Dash, Satya Ranjan and Dawud, Musa Abdullahi and Parida, Shantipriya and Muhammad, Shamsuddeen Hassan and Ahmad, Ibrahim Sa'id and Panda, Subhadarshi and Bojar, Ond{\v{r}}ej and Galadanci, Bashir Shehu and Bello, Bello Shehu},
  journal={arXiv preprint arXiv:2205.01133},