zelligharris

Reviving Zellig S. Harris: More linguistic information for distributional lexical analysis of English and Czech

It is a popular truism nowadays that the distributional similarity of two words implies their semantic relatedness. This idea goes back to the American linguist Zellig S. Harris, who formulated it as the Distributional Hypothesis in the fifties, without having the computational capacity to empirically verify it. Although there are a number of working distributional semantic models, there are still many interesting problems left to solve. Based on our lexicographical experience as well as on Harris' studies on co-occurrences and transformations we hypothesize that there is still leeway for improvement in the description of the syntactic structure of a word's immediate context and want to attempt at one. We have developed a preliminary version of a rule-based tagger that explicitly records the (to our intuition) most relevant syntactic phenomena. We are ready to accomplish it and to experimentally evaluate its effect on the automatic assessment of semantic relatedness between words. We focus on English and will proceed to Czech.

The project goal is to identify and experimentally verify on large text corpora the syntactic clues in context that determine the usage pattern and, hence, the meaning of a word by an automatic tagging with new set of labels.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

zelligharris

Reviving Zellig S. Harris: More linguistic information for distributional lexical analysis of English and Czech