This project initiated to enrich Odia language NLP resources, particularly for machine translation. OdiEnCorp is a collection of Odia-English parallel and Odia monolingual sentences collected from different sources such as Odia Wikipedia, web sites, books, and dictionaries using different manual and machine learning techniques including web scraping and optical character recognition. We described the need, development process, and benefit of such corpus [here].
Two releases of English-Odia corpus were created:
The latter (OdiEnCorp 2.0) serves in WAT 2020 EnglishOdia Indic Task. For using additional resouce, please refer to the Odia NLP Resource Catalog for English-Odia parallel and Odia Monolingual data and mention in your system description paper. Ask the organizer for using any other corpora other than those listed in the Odia NLP Resource Catalog.
Please refer to the WAT2020 webpage for registration/timeline/submission details.
[OdiEnCorp 1.0] : OdiEnCorp: Odia–English and Odia-Only Corpus for Machine Translation
[OdiEnCorp 2.0] : OdiEnCorp 2.0: Odia-English Parallel Corpus for Machine Translation
email: wat-multimodal-task@ufal.mff.cuni.cz
The data is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
This shared task is supported by the below projects/grants from Idiap Research Institute (Switzerland) and Charles University (Czech Republic).
If you use OdiEnCorp 1.0 or OdiEnCorp 2.0, please cite the respective paper:
@incollection{parida2020odiencorp, title={OdiEnCorp: Odia--English and Odia-Only Corpus for Machine Translation}, author={Parida, Shantipriya and Bojar, Ond{\v{r}}ej and Dash, Satya Ranjan}, booktitle={Smart Intelligent Computing and Applications}, pages={495--504}, year={2020}, publisher={Springer} } @inproceedings{parida2020odiencorp, title={OdiEnCorp 2.0: Odia-English Parallel Corpus for Machine Translation}, author={Parida, Shantipriya and Dash, Satya Ranjan and Bojar, Ond{\v{r}}ej and Motlicek, Petr and Pattnaik, Priyanka and Mallick, Debasish Kumar}, booktitle={Proceedings of the WILDRE5--5th Workshop on Indian Language Data: Resources and Evaluation}, pages={14--19}, year={2020} }