WAT2023 English-Bengali Multi-Modal Translation Task

After three successive event of “WAT 2019, WAT2020, WAT2021, and WAT2022 English-Hindi Multimodal Translation Task”, the Workshop on Asian Translation 2023 (WAT2023) will continue the task of multimodal task with Bengali. The task relies on our “Bengali Visual Genome”, a multimodal dataset consisting of text and images suitable for English-Bengali multimodal machine translation task and multimodal research.

Timeline

July 7: Translations need to be submitted to the organizers
July 14: System description paper submission deadline
July 28: Review feedback for system description
Aug 4: Camera-ready
Sep 4: WAT2023 takes place

Task Description

The setup of the WAT2023 task is as follows:

Inputs:
- An image,
- A rectangular region in that image
- A short English caption of the rectangular region.
Output:
- The caption translated to Hindi.

Types of Submissions Expected

The setup of the WAT2023 task is as follows:

Text-only translation
Hindi-only image captioning
Multi-modal translation (uses both the image and the text)

Training Data

The Hindi Visual Genome consists of:

29k training examples
1k dev set
1.6k evaluation set

Evaluation

WAT2023 Multi-Modal Task will be evaluated on:

1.6k evaluation set of Bengali Visual Genome
1.4k challenge set of Bengali Visual Genome

Means of evaluation:

Automatic metrics: BLEU, CHRF3, and others
Manual evaluation, subject to the availability of Bengali speakers

Participants of the task need to indicate which track their translations belong to:

Text-only / Image-only / Multi-modal
- see above
Domain-Aware / Domain-Unaware
- Whether the full (English) Visual Genome was used in the training or not.
Constrained / Non-Constrained
- 29k training segments from the Bengali Visual Genome
- HindEnCorp 0.5
- (English-only) Visual Genome [making the submission a domain-aware run]
Non-constrained submission may use other data, but need to specify what data was used.

Download Link

http://hdl.handle.net/11234/1-3722

Submission Requirement

The system description should be a short report (4 to 6 pages) submitted to WAT 2023 describing the method(s).

Each participating team can submit at most 2 systems for each of the task (e.g. Text-only, Bengali-only image captioning, multimodal translation using text and image). Please submit through the submission link available in the WAT2023 website and select the task for submission.