Monday, April 20, 2015 - 13:30

CLARIN: Requirements, Examples & Experiences

Abstract: This talk will give a general introduction into the distributed research infrastructure CLARIN and its German branch CLARIN-D. Important pillars of CLARIN's technical infrastructure will be presented. Among them are the authentication and authorization infrastructure for single sign on or core elements such as persistent identifiers to enable researchers to cite resources in a persistent way. Requirements concerning metadata and its harvesting will be presented, as they allow for searching resources in CLARIN's Virtual Language Observatory based on their metadata. In addition CLARIN's Federated Content Search for searching within the content of textual resources will be introduced. Focus will also be on the functional interaction of the different infrastructural components.
Finally the integration of existing resources such as corpora or tools into CLARIN's infrastructure will be addressed. Current and former examples of such curation processes will be introduced.


Gerhard Heyer has studied Mathematical Logic and Philosophy at Cambridge University (Philosophy Tripos, Christ’s College 1973-1976, Robert-Birley Scholarship), and General Linguistics at the University of the Ruhr, where he received his Ph.D. in 1983. After research on AI based natural language processing at the University of Michigan, Ann Arbor, with support by the Alexander-von-Humboldt Foundation (Feodor-Lynen Scholarship) he has been working as a systems specialist and manager within the Olivetti Group for establishing TA Triumph Adler´s activities on research and development in electronic publishing and natural language processing. He has also been responsible for the definition and performance of, a.o., ESPRIT projects Translator´s Workbench, Translator´s Workbench II, and MultiLex.
Since April 1994, Gerhard Heyer holds the chair on Automatic Language Processing at the computer science department of the University of Leipzig. His field of research is focussed on automatic semantic processing of natural language text with applications in the area of information retrieval and search as well as knowledge management.
Gerhard Heyer has been a member of numerous programme, reviews and recruitment committees, and has served as a member of the scientific advisory council of the GESIS Institute IZ from 1997 until 2006. He also was a member of the GESIS Kuratorium from 2006 to 2007. 
At the faculty of Mathematics and Computer Science of the University of Leipzig he served as dean of studies from 1999 until 2002, and dean of the faculty from 2002 – 2005, re-elected for the period 2005-2008.

Dirk Goldhahn studied Computer Science with a minor in Linguistics at the University of Leipzig. Since 2011 he is a research associate in the Natural Language Processing Group of the Department of Computer Science where he obtained his doctoral degree in 2013. His main research areas are electronic language resources, especially their creation, processing and utilization. He is involved in the "Projekt Deutscher Wortschatz" and the "Leipzig Corpora Collection" of the University of Leipzig, but was also active in projects such as the Library of the Billion Words. Currently Dirk Goldhahn is head of technical infrastructure of CLARIN-D, a project that aims to provide a research infrastructure for the humanities and social sciences.