22nd Vilem Mathesius Lecture Series/Companions Fall School
December 8 – December 12, 2008
Where:Faculty of Mathematics and Physics, Malostranske nam. 25, Prague, Czech Republic ( travel info)
How to participate
- Participants from the non-EU countries of Central and Eastern Europe: Thanks to the support of the Open Society Institute – Higher Education Support Programme Budapest, the attendance of 20 people will be fully funded.
The OSI funding will cover:
- accommodation costs for lodging in a double room in Krystal Hotel and breakfasts from Dec 7 till 13, 2008
- lunches and coffee breaks in Profesní dům (House of Professionals, Malostranske nam. 25) from Dec 8 till 12, 2008
- contribution to the travel costs up to 300USD. If your travel ticket is cheaper than 300USD the amount exactly corresponding to the price of ticket will be reimbursed; if it is more than 300USD you will get 300USD.
01/09/2008: The Call for applications has been CLOSED. Please, do not send us the application forms anymore.
- Participants outside the non-EU countries: Your attendace at the Series will be on your own expenses. The organizers can help you to arrange the accommodation and lunches. Please, contact Mrs. Kotesovcova if you wish her to book a room in Krystal Hotel (cca 40 EUR/night) and if you wish to have lunches served in Profesní dům (House of Professionals, Malostranske nam. 25).
- Bjorn Gamback (SICS, Stockholm, Sweden): Knowledge Bases and Reasoning
- Jan Hajic (Charles University, Prague, Czech Republic): Semantic Representation and the Family of Prague Dependency Treebank
- Fred Jelinek (Charles University, Prague, Czech Republic): Speech Recognition and Reconstruction
- Roger Moore (University of Sheffield, GB): Individuality & Emotion in Speech and Language: Implications for Artificial Conversational Agents
Abstract It is often asserted that, because emotion plays such a crucial role in everyday exchanges between human beings in both social and work-related settings, it must also offer practical benefits to human-computer interaction. Indeed over the past few years a new field has emerged known as 'affective computing', within which the processing of individuality and emotion in speech and language is seen as an important underpinning technology. This series of tutorials will (i) review the relevant theoretical background in the area of individuality and emotion, (ii) present the state-of-the-art in affective speech and language processing, and (iii) address the issues raised by the introduction of affect into the behaviour of artificial conversational agents. Opportunities will be given for questions and answers throughout the lecture series.
- LECTURE #1: Individuality & Emotion in Speech and Language
- affective behaviour in animals and humans
- mental, physical and social aspects of behaviour
- affective science
- affective states (personality, emotion, stances, moods, attitudes)
- descriptive theories
- speech and language under stress
- LECTURE #2: Individuality & Emotion in Speech and Language Processing
- affective computing
- artificial systems
- computational models
- speaker identification
- voice stress analysis
- emotion recognition & generation
- LECTURE #3: Individuality & Emotion in Conversational Agents
- Dennett's three stances
- believability, acceptability and the 'uncanny valley'
- studies using 'Wizard of Oz' (WoZ) techniques
- attributes of assistantship, servitude, companionship
- R&D challenges: creating and exploiting communicative affordances
- LECTURE #1: Individuality & Emotion in Speech and Language
- Markku Turunen (University of Tampere, Finland): Dialog System Architecture
Abstract The lecture covers the most important design and development issues related to multimodal and spoken dialogue systems. The goal is to learn the characteristics of human speech and communication, the processing of speech by computers, and the successful use of speech and supporting modalities in human-computer interaction to construct conversational spoken dialogue systems. After this course the students should have a good idea of the various tasks that are part of such systems. More materials can be found at http://www.cs.uta.fi/~mturunen/.
- Enrico Zovato (Loquendo S.p.A., Italy), Jan Romportl (University of West Bohemia, Czech Republic): Text-to-Speech Systems
Abstract Talking machines have been studied by researchers and scientists for centuries. However, only in recent years, with the introduction of the automatic treatment of information, speech synthesis technologies have reached the maturity that is needed in human-computer interactions. Their role is the conversion of any kind of written text into synthetic speech, since the name of Text To Speech systems (TTS). Speech synthesis systems underwent many technological and paradigm changes. At the beginning a pure generation approach was adopted. In this case, the waveform is produced by means of parametric models, and contextual rules. With the increase in calculation and storage capabilities, a new approach was experimented. It consisted in storing a considerable amount of speech data from a single speaker and then in recombining segments of speech according to certain selection strategies. In the last decade, this technology has further improved, and it now provides highly intelligible synthetic speech, while keeping adequate acoustic quality and naturalness. Beyond the available technologies, synthesis systems basically consist of linguistic modules and waveform generation modules. The first convert the graphemic input string, into a phonetic representation that accurately describes the sequence of sounds that have to be produced in a given language. Moreover, information on "how" a given text has to be spoken is provided. This last description regards the supra-segmental level, in which prosody plays an important role. The prediction of the prosodic target can be realized by means of rules or statistically based mechanisms. In this lecture we will illustrate the concatenative Text To Speech technology starting from Text processing, including normalization, phonetic transcription and prosodic target generation. Selection mechanisms will be described as well as signal processing techniques applied to the selected units. We will also report some considerations related to the design and the production of speech databases. The drawback of this technology is the lack of flexibility, due to the fact that it relies on the concatenation of unmodified speech segments. The challenge, for next generation systems, will be studying strategies for overcoming this limitation. In particular, expressive and emotional synthetic speech has become a hot requirement for systems based on computer interfaces like, for example, embodied conversational agents or tutoring applications. In this lecture two TTS systems will be presented. The first one is a commercial multilingual system while the second is a Czech system developed by the University of West Bohemia. Many synthesis examples will be provided as well as examples of applications that exploit this technology.
- History of speech synthesis - Overview on existing technologies (articulatory, formant, hmm, concatenative)
- The Unit selection technique
- Text Normalization
- Phonetic Transcription
- Prosody processing and generation
- Selection strategies
- Signal generation/processing
- Speech Databases
- Emotional/expressive speech synthesis
- Two TTS systems:
- Loquendo multilingual TTS
- Czech TTS, ARTIC.
- TTS based applications
- Talking heads
- Automatic dubbing
- Conversational agents
ContactMrs. Anna Kotesovcova
UFAL MFF UK
Malostranske nam. 25
118 00 Prague 1
tel.: +420-221 914 226
fax: +420-221 914 309