With the explosion of new automatic metrics it is difficult for researchers to develop and retain the kinds of heuristic intuitions about metric differences that drove earlier research and deployment decisions, such as +1 BLEU. In this talk, we investigate the varying “dynamic score range” of many modern metrics to provide an understanding of the meaning of differences in scores both within and among metrics; in other words, we ask what point difference X in metric Y is required between two systems for humans to notice?
*** The talk will be delivered in person (MFF UK, Malostranské nám. 25, 4th floor, room S1) and will be streamed via Zoom. For details how to join the Zoom meeting, please write to sevcikova et ufal.mff.cuni.cz ***