Words and Classes. Branches and Links. Interlinking (Latin) Resources in the Linguistic Linked Open Data World through the LiLa Knowledge Base

Monday, 26 April, 2021 - 14:00

Room:

Words and Classes. Branches and Links. Interlinking (Latin) Resources in the Linguistic Linked Open Data World through the LiLa Knowledge Base

Marco Passarotti (Università Cattolica del Sacro Cuore, Milan, Italy)

The talk presents the LiLa Knowledge Base (https://lila-erc.eu), a collection of multifarious linguistic resources for Latin described with the same vocabulary of knowledge description and interlinked according to the principles of the so-called Linked Data paradigm.

Following its highly lexically based nature, the core of the LiLa Knowledge Base consists of a large collection of Latin lemmas, serving as the backbone to achieve interoperability between the resources, by linking all those entries in lexical resources and tokens in corpora that point to the same lemma.

After detailing the architecture supporting LiLa, the talk:
a) describes the LiLa collection of lemmas, particularly focussing on how the Knowledge Base approaches the challenges raised by harmonizing different strategies of lemmatization that can be found in linguistic resources for Latin;
b) details the modeling and linking of a number of textual and lexical resources for Latin, including a dependency treebank, an etymological dictionary, a polarity lexicon, a derivational lexicon and a valency lexicon;
c) shows the prototype of a tool to automatically link a raw Latin text to the Knowledge Base and presents some SPARQL queries to extract information taken from the interoperable resources currently linked to LiLa.

***The talk will be streamed via Zoom. For details how to join the Zoom meeting, please write to sevcikova et ufal.mff.cuni.cz***

CV:

Marco Passarotti is Associate Professor at Università Cattolica del Sacro Cuore (Milan). His main research interests focus on bulding, using and disseminating linguistic resources and natural language processing tools for Latin.

A pupil of one of the pioneers of humanities computing, father Roberto Busa SJ, since 2006 he heads the "Index Thomisticus" Treebank project. In 2009, he founded the CIRCSE research centre of computational linguistics at Università Cattolica.

Currently, he is Principal Investigator of an ERC-CoG Grant (2018-2023) aimed at building a Linked Data based Knowledge Base of resources and tools for Latin.

He is author of more than 100 publications and has organized and chaired several international scientific events.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Words and Classes. Branches and Links. Interlinking (Latin) Resources in the Linguistic Linked Open Data World through the LiLa Knowledge Base

Marco Passarotti (Università Cattolica del Sacro Cuore, Milan, Italy)