WAT2023 English-Hindi Multi-Modal Translation Task

After three successive events of “WAT 2019, WAT2020, WAT2021, and WAT2022 English-Hindi Multimodal Translation Task”, the Workshop on Asian Translation 2023 (WAT2023) will continue the task of multimodal English-to-Hindi translation which is the first multimodal translation task for any Indian language. The task relies on our “Hindi Visual Genome,” a multimodal dataset of text and images suitable for English-Hindi machine translation tasks and multimodal research.

Timeline

July 07: Translations need to be submitted to the organizers
July 14, System description paper submission deadline
July 28: Review feedback for system description
Aug 4: Camera-ready
Sep 4: WAT2023 takes place

Task Description

The setup of the WAT2023 task is as follows:

Inputs:
- An image,
- A rectangular region in that image
- A short English caption of the rectangular region.
Output:
- The caption translated to Hindi.

Types of Submissions Expected

The setup of the WAT2023 task is as follows:

Text-only translation
Hindi-only image captioning
Multi-modal translation (uses both the image and the text)

Training Data

The Hindi Visual Genome consists of:

29k training examples
1k dev set
1.6k evaluation set

Evaluation

WAT2023 Multi-Modal Task will be evaluated on:

1.6k evaluation set of Hindi Visual Genome
1.4k challenge set of Hindi Visual Genome

Means of evaluation:

Automatic metrics: BLEU, CHRF3, and others
Manual evaluation, subject to the availability of Hindi speakers

Participants of the task need to indicate which track their translations belong to:

Text-only / Image-only / Multi-modal
- see above
Domain-Aware / Domain-Unaware
- Whether or not the full (English) Visual Genome was used in training.
Constrained / Non-Constrained
- 29k training segments from the Hindi Visual Genome
- HindEnCorp 0.5
- (English-only) Visual Genome [submitting a domain-aware run]
Non-constrained submissions may use other data but need to specify what data was used.

Download Link

Hindi Visual Genome 1.1

Submission Requirement

The system description should be a short report (4 to 6 pages) submitted to WAT 2023 describing the method(s).

Each participating team can submit at most two systems for each task (e.g., Text-only, Hindi-only image captioning, multimodal translation using text and image). Please submit through the submission link available on the WAT2022 website and select the task for submission.