Shantipriya Parida

office hours
9AM - 6PM
Malostranské náměstí 25
118 00 Praha 1
Czech Republic

Main Research Interests

NLP, Deep Learning, Neural Machine Translation, Machine Learning, Computational NeuroScience


Hindi Visual Genome (Hindi-English Multimodal Dataset)

Hindi Visual Genome , a multimodal dataset consisting of text and images suitable for English-to-Hindi multimodal machine translation task and multimodal research. The short English segments (captions) from Visual Genome ( selected along with associated images and automatically translated them to Hindi with manual post-editing (, taking the associated images into account. The training set contains 32K segments, accompanied by a challenge test set of 1400 segments. This test set was created by searching for (particularly) ambiguous English words based on the embedding similarity and manually selecting those where the image helps to resolve the ambiguity.

Final verification and Release in progress. More info can be found below.

OdiEnCorp 1.0 (Odia-English Parallel and Odia Monolingual Corpus)

We have collected English-Odia parallel and monolingual data from the available public websites for NLP research in Odia. The parallel corpus consists of English-Odia parallel Bible, Odia digital library, and Odisha Goverment websites. It covers bible, literature, goverment of Odisha and its policies. We have processed the raw data collected from the websites, performed alignments (a mix of manual and automatic alignments) and release the corpus in a form ready for various NLP tasks.

The Odia monolingual data consists of Odia-Wikipedia and Odia e-magazine websites. Because the major portion of data is extracted from Odia-Wikipedia, it covers all kinds of domains. The e-magazines data mostly cover the literature domain. We have preprocessed the monolingual data including de-duplication, text normalization, and sentence segmentation to make it ready for various NLP tasks.

The released corpus is available freely for non-commercial research purpose at below link:


Curriculum Vitae


Postdoc in Neural Machine Translation under guidance of Dr. Ondřej Bojar at Institute of Formal and Applied Linguistic, Faculty of Physics & Mathematics, Charles University, Prague (continuing...)

Ph.D. in Computer Science, Utkal University, INDIA, 2016. THESIS TITLE: “CLASSIFYING INSTANTANEOUS COGNITIVE STATES BASED ON MACHINE LEARNING APPROACH”, under the guidance of Dr. Satchidananda Dehuri, Reader F. M. University, Balasore, Odisha, INDIA.

Master of Technology (First Class with Honors) in Computer Science, School of Mathematics Statistics & Computer Science, Utkal University, INDIA 2004. DISSERTATION TITLE : “COMBINATION OF CLASSIFIERS”, completed from Machine Intelligence Unit, Indian Statistical Institute under the guidance of Prof. Ashish Ghosh.

Master of Computer Application (First Class), Utkal University, 2001.

Bachelor of Science, Utkal University, 1998.


Worked as System Architect in Huawei Technologies India Pvt. Ltd, Bangalore, INDIA from July 2007 to Jan 2018.

  • Understanding customer requirements, designing mobile broadband, IPTV/OTT solution.
  • Participating in Bidding/PostBid phase, conducting customer workshop,CTO level presentation.
  • Industry trend analysis, competitor analysis, white paper preparation

Worked as Senior Software Engineer in Torry Harris Business Solutions, Bangalore,
INDIA from May 2005 to July 2007.

  • Team leader for development and L3 support team for a Telecom Fraud Management Product owned by a UK based Telecom Operator.
  • Development using UNIX, C++, Shell Scripting, AWK/SED.

Worked as Software Engineer in ANZ Information Technology, Bangalore, INDIA
from Oct 2004 to Apr 2005.

  • Developing banking solution using UNIX, C++.


Selected Bibliography


  • T. Kocmi, S. Parida,  O.Bojar. "CUNI NMT System for WAT 2018 Translation Tasks". In Proceedings of the 5th Workshop on Asian Translation (WAT2018), Hong Kong, China, December. Demo English-to-Hindi Translation URL Based on WAT 2018 Model  :
  • S. Parida, O. Bojar. “Translating Short Segments with NMT: A Case Study in English-to-Hindi”. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, p. 229–238 Alacant, Spain, May 2018.
  • S. Parida, S. Dehuri & S.-B. Cho. “Neuro-Fuzzy Ensembler for Cognitive States Classification”, Advance Computing Conference (IACC 2014), pp. 1243-1247, IEEE, 2014.
  • S. Parida, S. Dehuri & S.-B. Cho. "Application of Genetic Algorithms and Gaussian Bayesian Approach in Pipeline for Cognitive State Classification”, Advance Computing Conference (IACC 2014), pp. 1237-1242, IEEE, 2014.
  • S.Parida & S. Dehuri. “A Review of Hybrid Techniques based on Machine Learning Approach in Cognitive Classification”, Soft Computing for Problem Solving (SocPros 2012), Springer AISC, vol. 236, pp. 659-666, 2014.
  • S. Parida, S. Dehuri & G.-N. Wang. “Genetic Algorithms Based Feature Selection for Cognitive State Classification Using Ensemble of Decision Trees”, In Proceedings of the International Conference on Vibration Problems (ICOVP 2013), pp. 1-10, 2013.
  • S. Parida & S. Dehuri. “A Study of Feature Selection Techniques for fMRI based  State Classification”, Advancements in the Era of Multi-Disciplinary Systems(AEMDS 2013), pp. 500-507, Elsevier, 2013.


  • L. A. Cacha, S. Parida, S. Dehuri, S. B. Cho, & R. R. Poznanski, "A fuzzy integral method based on the ensemble of neural networks to analyze fMRI data for cognitive state classification across multiple subjects". Journal of integrative neuroscience, 15(04), 593-606, 2016.
  • S. Parida, S. Dehuri & S.-B. Cho. “Machine Learning Approaches for Cognitive State Classification and Brain Activity Prediction: A Survey”, Current Bioinformatics, Bentham Science Publishers, vol. 10, pp. 344- 359, 2015.
  • S. Parida, S. Dehuri, S.-B. Cho, L. A. Cacha, & R. R. Poznanski. “A Hybrid Method for Classifying Cognitive States from fMRI data”, Journal of Integrative Neuroscience, World Scientific, vol. 14, pp. 355-368, 2015.
  • S. Parida & S. Dehuri. “Review of fMRI Data Analysis: A Special Focus on Classification”, International Journal of E-Health and Medical Communications (IJEHMC), IGI Global, vol. 5, pp. 1-26, 2014.
  • S. Parida & S. Dehuri. “Applying Machine Learning Techniques for Cognitive State Classification”, International Journal of Computer Applications (IJCA), pp. 40-45, 2013.
  • Gupta, R, & S. Parida. “Challenges and Opportunities: Mobile Broadband”, International Journal of Future Computer and Communication, vol. 2, no. 6, pp. 660, IACSIT Press, 2013.


  • EAMT
  • IEEE
  • OITS


  • IEEE Access
  • Journal of Integrative Neuroscience
  • Journal of Central South University
  • Artificial Intelligence in Medicine
  • The world journal of Biological Psychiatr

Personal home page