Monday, 31 October, 2016 - 13:30 to 15:00

Multilingual parallel corpora and linguistic theory: How to compare constraints cross-linguistically

Parallel corpora have been used very successfully in applied and contrastive linguistics. In this talk, I want to demonstrate how they can help the linguist to answer important theoretical questions, presenting two case studies in morphosyntax and intercultural pragmatics. These studies are based on multilingual parallel corpora, i.e. those that contain translations in a large number of languages. The main advantage of parallel corpora is that they provide comparative micro-concepts (Haspelmath 2010), i.e. the contents of aligned translations in different languages. This enables us not only to compare cross-linguistically not only the verbalizations of these contents, as it is often done, but also the semantic, pragmatic and other factors that constrain the formal variation. For this purpose, I employ relatively novel multivariate methods, namely, conditional inference trees and random forests. The data come from my own parallel corpus of film subtitles and TED talks ParTy (Parallel corpus for Typology).
Causative constructions: iconicity or economy?
The first case study is based on film subtitles in ten diverse languages from different language families. I test the well-known universal correlation between form and function of causative constructions (e.g. Comrie 1981; Shibatani & Pardeshi 2002) and investigate diverse semantic and syntactic factors that determine the choice between lexical, morphological and analytic causatives across the languages, such as involvement of the Causer in the caused event or willingness of the Causee (cf. Dixon 2000). The results are interpreted in terms of the universal principles of iconicity and economy (e.g. Haiman 1983). I will also present additional evidence, which corroborates the conclusions based on the corpus analyses, and which is obtained from an artificial language learning experiment and comparable corpora of spoken language.
T/V forms in European languages
The second case study investigates the cross-linguistic differences in the constraints on the use of so-called T/V forms (e.g. French tu and vous, German du and Sie, Russian ty and vy) in ten European languages from different language families and genera. These constraints represent an elusive object of investigation because they depend on a large number of subtle contextual features and social distinctions, which should be cross-linguistically matched. I select more than two hundred contexts that contain the pronouns you and yourself in the original English versions of film subtitles, which are then coded for fifteen contextual variables that describe the Speaker and the Hearer, their relationships and different situational properties, operationalizing the parameters that have been mentioned in the literature, such as the dimensions of power and solidarity (Brown & Gilman 1960). On the basis of the translations of these situations in the film subtitles in ten languages, I identify the most relevant contextual variables that constrain the T/V variation in each language and compare these constraints across the languages.

Brown, R., & Gilman, A. (1960). The pronouns of power and solidarity. In Sebeok, T. A. (Ed.), Style in Language (pp. 253–276). Cambridge, MA: MIT Press.
Comrie, B. (1981). Language universals and linguistic typology: Syntax and morphology. Chicago: University of Chicago Press.
Dixon, R. M. W. (2000). A typology of causatives: Form, syntax and meaning. In Dixon, R. M. W. & Aikhenvald, A. Y. (Eds.), Changing valency: Case studies in transitivity (pp. 30–83). Cambridge: Cambridge University Press.
Haiman, J. (1983). Iconic and economic motivation. Language, 59(4), 781–819.
Haspelmath, M. (2010). Comparative concepts and descriptive categories in crosslinguistic studies. Language, 86(3), 663–687.
Shibatani, M., & Pardeshi, P. (2002). The causative continuum. In M. Shibatani (Ed.), The Grammar of Causation and Interpersonal Manipulation (pp. 85–126). Amsterdam: John Benjamins.


Natalia Levshina obtained a doctoral degree in linguistics at the University of Leuven (Belgium) in 2011. After the defence, her academic and personal life has fluctuated between Belgium and Germany. After a short but enjoyable stay in Jena, she went to Marburg University to participate in quantitative linguistic projects in Michael Cysouw’s team. After that she obtained funding from the Belgian research foundation F.R.S.-FNRS, which enabled her to work on her own project on causative constructions at the Catholic University of Louvain. At the moment, she is working in Martin Haspelmath’s Nikolai-Lab in Leipzig, investigating the role of frequency and economy in grammatical structures.
She defines herself as a quantitative general linguist with a fascination for corpora, statistical methods and the ‘human factor’ in language. She’s been compiling her own parallel ParTy corpus for typological research based on subtitles of films and TED talks. She has written an R manual for linguists, How to Do Linguistics with R: Data exploration and statistical analysis (2015, John Benjamins), and has published papers in international journals, such as Linguistics, Cognitive Linguistics and the Journal of Pragmatics.