WAT2024 English-Bengali Multi-Modal Translation Task

After the successive events of “WAT 2019-2023 English-Bengali Multimodal Translation Task”, the Workshop on Asian Translation 2024 (WAT2024) will continue the task of multimodal task with Bengali. The task relies on our “Bengali Visual Genome”, a multimodal dataset consisting of text and images suitable for English-Bengali multimodal machine translation tasks and multimodal research.

Timeline

XXX: Translations need to be submitted to the organizers
XXX: System description paper submission deadline
XXX: Review feedback for system description
XXX: Camera-ready
XXX: WAT2024 takes place

Task Description

The setup of the WAT2024 task is as follows:

Inputs:
- An image,
- A rectangular region in that image
- A short English caption of the rectangular region.
Output:
- The caption is translated into Hindi.

Types of Submissions Expected

The setup of the WAT2024 task is as follows:

Text-only translation
Hindi-only image captioning
Multi-modal translation (uses both the image and the text)

Training Data

The Hindi Visual Genome consists of:

29k training examples
1k dev set
1.6k evaluation set

Evaluation

WAT2024 Multi-Modal Task will be evaluated on:

1.6k evaluation set of Bengali Visual Genome
1.4k challenge set of Bengali Visual Genome

Means of evaluation:

Automatic metrics: BLEU, CHRF3, and others
Manual evaluation, subject to the availability of Bengali speakers

Participants of the task need to indicate which track their translations belong to:

Text-only / Image-only / Multi-modal
- see above
Domain-Aware / Domain-Unaware
- Whether the full (English) Visual Genome was used in the training or not.
Constrained / Non-Constrained
- 29k training segments from the Bengali Visual Genome
- (English-only) Visual Genome [making the submission a domain-aware run]
Non-constrained submissions may use other data but need to specify what data was used.

Download Link

http://hdl.handle.net/11234/1-3722

Submission Requirement

The system description should be a short report (4 to 6 pages) submitted to WAT 2024 describing the method(s).

Each participating team can submit at most 2 systems for each of the tasks (e.g. Text-only, Bengali-only image captioning, multimodal translation using text and image). Please submit through the submission link available on the WAT2024 website and select the task for submission.