From Court Verdicts to Structured Data: An LLM-Assisted Pipeline for Annotation and Clustering

Monday, 9 March, 2026 - 14:00

Room:

From Court Verdicts to Structured Data: An LLM-Assisted Pipeline for Annotation and Clustering

Jakub Drápal (PF UK)
Ivana Kvapilíková (ÚFAL MFF UK)

Court verdicts contain rich information about criminal behavior, but their unstructured format limits systematic analysis. We present an LLM-assisted pipeline for constructing structured datasets from factual statements contained in the verdict texts, combining attribute exploration, span annotation, and value clustering. Attributes are partly defined by legal experts and partly induced from the data, enabling a comparison between expert-defined and data-driven schema design. We validate our results on the Slovak dataset through careful manual evaluation, highlighting both the strengths and current limitations of the proposed pipeline and their impact on downstream legal analytics.

*** The talk will be delivered in person (MFF UK, Malostranské nám. 25, 4th floor, room S1) and will be streamed via Zoom. For details how to join the Zoom meeting, please write to sevcikova et ufal.mff.cuni.cz ***

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form