WAT2023 English-Malayalam Multi-Modal Translation Task

After a positive response from the participants of WAT 2021 and WAT2022 English-Malayalam Multimodal Translation Tasks, the Workshop on Asian Translation 2023 (WAT2023) will continue the English-Malayalam multimodal task. The task relies on our “Malayalam Visual Genome” (MVG), a multimodal dataset consisting of text and images suitable for English-Malayalam multimodal machine translation task and multimodal research.


  • July 7: Translations need to be submitted to the organizers
  • July 14: System description paper submission deadline
  • July 28: Review feedback for system description
  • Aug 4: Camera-ready
  • Sep 4: WAT2023 takes place

Task Description

The setup of the WAT2023 task is as follows:

  • Inputs:
    • An image,
    • A rectangular region in that image,
    • A short English caption of the rectangular region.
  • Output:
    • The caption translated to Malayalam.

Types of Submissions Expected

Participants of the WAT2022 task are welcome to submit their outputs in any subset of these submission types:

  • Text-only translation (only the test used as input)
  • Malayalam-only image captioning (only the image used as input)
  • Multi-modal translation (uses both the image and the text)

Training Data

The Malayalam Visual Genome consists of:

  • 29k training examples
  • 1k dev set
  • 1.6k evaluation set


WAT2021 English-Malayalam Multi-Modal Task will be evaluated on:

  • 1.6k evaluation set of Malayalam Visual Genome
  • 1.4k challenge set of Malayalam Visual Genome

Means of evaluation:

  • Automatic metrics: BLEU, CHRF3, and others
  • Manual evaluation, subject to the availability of Malayalam speakers

Participants of the task need to indicate which track their translations belong to:

  • Text-only / Image-only / Multi-modal
    • see above
  • Domain-Aware / Domain-Unaware
    • Whether the full (English) Visual Genome was used in the training or not.
  • Constrained / Non-Constrained
    • 29k training segments from the Malayalam Visual Genome
    • (English-only) Visual Genome [making the submission a domain-aware run]
  • Non-constrained submission may use other data, but need to specify what data was used.

Download Link

Submission Requirement

The system description should be a short report (4 to 6 pages) submitted to WAT 2023 describing the method(s).

Each participating team can submit at most 2 systems for each of the task (e.g. Text-only, Malayalam-only image captioning, multimodal translation using text and image). Please submit through the submission link available in the WAT2022 website and select the task for submission.   

Paper and References

Please refer to the below papers:

[paper] : https://cys.cic.ipn.mx/ojs/index.php/CyS/article/view/3294/2735

[arxiv] : https://arxiv.org/abs/1907.08948



  • Shantipriya Parida (Silo AI, Finland)
  • Ondřej Bojar (Charles University, Czech Republic)


email: wat-multimodal-task@ufal.mff.cuni.cz


The data is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.


This shared task is supported by the below projects/grants from Charles University (Czech Republic).

  • Grantová agentura České republiky, Project code: 19-26934X, Project name: Neural Representations in Multi-modal and Multi-lingual Modelling