Monday, 17 October, 2022 - 14:00

Multimodal Machines from the perspective of humans

Deep Neural Networks have rapidly become the most dominant approach to solve many complicated learning problems in recent times. Although initially inspired by biological neural networks, the current deep learning systems are much more motivated by practical engineering needs and performance requirements. And yet, some of these networks exhibit a lot of similarities with human brains. This talk explores how recent work has established similarities between representations learnt by deep learning systems and cognitive data collected from the human brain. The talk specifically discusses two questions concerning the comparison of humans with machines. One, given the same multimodal tasks, how could the performance of machines and current AI systems be compared? Two, how do we use multimodal deep learning systems to make predictions about human cognitive data? The talk introduces our Eyetracked Multi-Modal Translation (EMMT) corpus, a dataset containing monocular eye movement recordings, audio and 4-electrode electroencephalogram (EEG) data of 43 people as a tool to find answers to the questions posed above. The dataset consists of reading, sight translation and multimodal sight translation tasks involving different text and image stimuli settings when translating from English to Czech. Some analysis from the data, especially describing the observation of a variant of Stroop effect in the experiment data will also feature in this talk.


*** The talk will be delivered in person (MFF UK, Malostranské nám. 25, 4th floor, room S1) and will be streamed via Zoom. For details how to join the Zoom meeting, please write to sevcikova et ***