Monday, 26 February, 2024 - 14:00

Navigating the metrics maze: How reliable are metric differences?

Tom Kocmi (Microsoft)

With the explosion of new automatic metrics it is difficult for researchers to develop and retain the kinds of heuristic intuitions about metric differences that drove earlier research and deployment decisions, such as +1 BLEU. In this talk, we investigate the varying “dynamic score range” of many modern metrics to provide an understanding of the meaning of differences in scores both within and among metrics; in other words, we ask what point difference X in metric Y is required between two systems for humans to notice?



