Abstract:
Word sense disambiguation is the task of assigning the correct sense of a polysemous word in the context in which it appears. In recent years, word embeddings have been applied to many NLP tasks successfully. Thanks to their ability to capture distributional semantics, more recent attention has focused on utilizing word embeddings to disambiguate words.
In the first part of talk, we review a novel unsupervised method to disambiguate words from the first language by deploying the trained word embeddings model of the second language using only a bilingual dictionary. The main idea of this work is to use information provided by English-translated surrounding words to disambiguate Persian words using trained English word2vec model.
In the second part, we continue with introducing four improvements to existing state-of-the-art supervised WSD approaches. These improvements include a new model for assigning vector coefficients for a more precise context representation. Second, we apply a PCA dimensionality reduction process to find a better transformation of feature matrices. Third, a new weighting scheme is suggested and finally, a voting strategy is presented to combine word embedding features extracted from different independent corpora.