WAT2019 Multi-Modal Translation Task

In 2019, the Workshop on Asian Translation 2019 (WAT2019) included the task of multimodal English-to-Hindi translation for the first time in its history. The task relies on our “Hindi Visual Genome”, a multimodal dataset consisting of text and images suitable for English-Hindi multimodal machine translation task and multimodal research.

Timeline

May 30, 2019: Release of Training Data
August 3, 2019: Test week starts, release of source side of the test set
August 10, 2019: Test week ends, translations need to be submitted to the organizers
September 13, 2019: System description paper submission deadline
September 20, 2019: Review feedback for system description
September 30, 2019: Camera-ready
November 3-4, 2019: WAT2019 takes place

Task Description

The setup of the WAT2019 task is as follows:

Inputs:
- An image,
- A rectangular region in that image
- A short English caption of the rectangular region.
Output:
- The caption translated to Hindi.

Types of Submissions Expected

The setup of the WAT2019 task is as follows:

Text-only translation
Hindi-only image captioning
Multi-modal translation (uses both the image and the text)

Training Data

The Hindi Visual Genome consists of:

29k training examples
1k dev set
1.6k evaluation set

The evaluation test set will be released jointly with the release of Hindi Visual Genome but it will serve as the first part of WAT2019 official test set. Therefore, it must not be used by the participants during the training or model selection.

Evaluation

WAT2019 Multi-Modal Task will be evaluated on:

1.6k evaluation set of Hindi Visual Genome (mentioned above)
- This test set is being released in the HVG package, so remember not to use it in any way.
1.4k challenge set of Hindi Visual Genome
- This second part of WAT2019 official test set will be released only at the WAT2019 evaluation week.
- It nevertheless comes from the original Visual Genome dataset, so participants are requested to indicate whether they consider the original (English-only) Visual Genome dataset in their training.

Means of evaluation:

Automatic metrics: BLEU, CHRF3, and others
Manual evaluation, subject to the availability of Hindi speakers

Participants of the task need to indicate which track their translations belong to:

Text-only / Image-only / Multi-modal
- see above
Domain-Aware / Domain-Unaware
- Whether the full (English) Visual Genome was used in the training or not.
Constrained / Non-Constrained
- 29k training segments from the Hindi Visual Genome
- HindEnCorp 0.5
- (English-only) Visual Genome [making the submission a domain-aware run]
Non-constrained submission may use other data, but need to specify what data was used.

Download Link

HindiVisualGenome 1.0
- Remember NOT to use the test set included in the package. It forms one half of our evaluation data.
HindiVisualGenome Challenge TestSet 1.0

Submission Requirement

The system description should be a short report (4 to 6 pages) submitted to WAT 2019 describing the method(s).

Each participating team can submit at most 2 systems for each of the task (e.g. Text-only, Hindi-only image captioning, multimodal translation using text and image)

Preprint

Please refer to the preprint version of the paper:

Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation

Human Evaluation Result

Note: Score is the average score in the original 0-100 and *Zscores are scores but first standardized for each annotator across all his/her annotations. Details are be available in the "Overview of the 6th Workshop on Asian Translation" proceeding. Human evaluation interface used by the evaluators.

Report us for any error spotted.

Multimodal Subtask	Team	DataID	Score	*ZScore
EVTEXT	IDIAP	2956	72.84	0.70
	683	3285	68.89	0.57
	683	3286	61.63	0.36
	NITSNLP	3299	52.53	0.00
CHTEXT	IDIAP	3277	59.81	0.22
	IDIAP	3267	59.36	0.22
	683	3287	45.38	-0.24
	683	3284	45.95	-0.26
	NITSNLP	3300	27.91	-0.82
EVHI	NITSNLP	3289	51.77	-0.04
CHHI	NITSNLP	3297	44.45	-0.34
	683	3304	26.54	-0.94
EVMM	683	3271	69.17	0.60
	NITSNLP	3288	58.98	0.25
	PUP-IND	3296	62.42	0.34
	PUP-IND	3295	60.22	0.27
CHMM	683	3270	54.5	0.08
	NITSNLP	3298	48.45	-0.19
	PUP-IND	3281	48.06	-0.12
	PUP-IND	3280	47.06	-0.16

Organizers Presentation

The overview presentation includes the task, participants, results, and analysis. Presented at EMNLP 2019 Hongkong (WAT Workshop) on 4th Nov 2019. [ppt]

Organizers

Ondřej Bojar (Charles University, Czech Republic)
Shantipriya Parida (Idiap Research Institute, Switzerland)

Contact

email: wat-multimodal-task@ufal.mff.cuni.cz

License

The data is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

Acknowledgement

This shared task is supported by the grant nr. 19-26934X (NEUREM3) of Czech Science Foundation.

Hindi Visual Genome

Hindi-English Multimodal Dataset

Search form