Monday, 8 January, 2018 - 13:30 to 15:00

Towards Effective Retrieval of Spontaneous Conversational Spoken Content

Spoken content retrieval (SCR) has been the focus of various research initiatives for more then 20 years. Early research focused on retrieval of clear defined spoken documents principally from the broadcast news domain. The main focus of this work was spoken document retrieval (SDR) task at TREC 6-9. The end of this work saw SDR declared a largely solved problem. However, this was soon found to be a premature conclusion relating to controlled recordings of professional news content and overlooking many of the potential challenges of searching more complex spoken content. Subsequent research has focused on more challenging tasks such as search of interview recordings and semi-professional internet content. This talk will begin by briefly reviewing early in SDR, explaining its successes and limitations. It will then move to outline work exploring SCR for more challenging tasks, such as identifying relevant elements in long spoken recordings such as meetings and presentations, provide an analysis of the characteristics and challenges of retrieval for spoken content elements, and then introduce latest work in our research group exploring potential methods to improve retrieval effectiveness for this data.

This is joint work with David Racca and Maria Eskevich, Dublin City University.


Gareth J.F. Jones is a Professor in the School of Computing at Dublin City University (Ireland). He holds a Ph.D. degree from the University of Bristol (UK), and he has previously served as a postdoctoral researcher at the University of Cambridge (UK), a Toshiba Fellow at the Toshiba Research and Development Center in Kawasaki (Japan), and a Lecturer at the University of Exeter (UK). His research focuses on information retrieval and access technologies, including multilingual, multimedia and mobile applications. Professor Jones was General Co-Chair for SIGIR 2013 and CLEF 2017, he is a co-organizer of the MediaEval benchmark, and has contributed to a number of individual benchmark tasks in spoken and multimedia information retrieval.