Odia Visual Genome (OVG) is a multimodal dataset consisting of text and images suitable for English-to-Odia multimodal machine translation tasks and multimodal research. We have selected short English segments (captions) from Visual Genome along with associated images and automatically translated them into Odia with manual post-editing, taking the associated images into account.

The training set contains 29K segments. A further 1K and 1.6K segments are provided as development and test sets, respectively, following the same (random) sampling from the original Odia Visual Genome. Additionally, a challenge test set of 1,400 segments was prepared for the WAT multimodal task. This challenge test set was created by searching for particularly ambiguous English words based on embedding similarity and manually selecting those where the image helps to resolve the ambiguity. The surrounding words in the sentence, however, also often include sufficient cues to identify the correct meaning of the ambiguous word.

 

Dataset Link: http://hdl.handle.net/11234/1-5979