Spoken content retrieval (SCR) has been the focus of various research initiatives for more then 20 years. Early research focused on retrieval of clear defined spoken documents principally from the broadcast news domain. The main focus of this work was spoken document retrieval (SDR) task at TREC 6-9. The end of this work saw SDR declared a largely solved problem. However, this was soon found to be a premature conclusion relating to controlled recordings of professional news content and overlooking many of the potential challenges of searching more complex spoken content. Subsequent research has focused on more challenging tasks such as search of interview recordings and semi-professional internet content. This talk will begin by briefly reviewing early in SDR, explaining its successes and limitations. It will then move to outline work exploring SCR for more challenging tasks, such as identifying relevant elements in long spoken recordings such as meetings and presentations, provide an analysis of the characteristics and challenges of retrieval for spoken content elements, and then introduce latest work in our research group exploring potential methods to improve retrieval effectiveness for this data.
This is joint work with David Racca and Maria Eskevich, Dublin City University.