Vojtěch Lanz
Main Research Interests
- Clinical NLP
- Question-Answering and Information Extraction from long, multilingual clinical documents
- Domain-specific tokenization and pretraining for clinical language models
- Efficiency and Optimization of Large Language Models
- Post-training alignment of hybrid models (Transformers + Mamba)
- KV cache compression
- Computational Musicology
- Gregorian chant analysis using Bayesian nonparametrics and bioinformatic methods
Projects
- PhD Thesis
Topic: Document-level information extraction
Supervisor: doc. RNDr. Pavel Pecina, Ph.D.- RES-Q+: Comprehensive solutions of healthcare improvement based on the global Registry of Stroke Care Quality.
- GAUK: Empowering Healthcare with Large Language Models: Reducing Clinicians' Workload and Improving Stroke Patient Care
- DACT: Digital Analysis of Chant Transmission, advancing the global study of plainchant transmission through digital analysis and computational resources.
- GI-Insight: New methods for stomach examination using artificial intelligence: Utilization of deep learning for assisted gastroscopy.
Curriculum Vitae
Selected Bibliography
- Google Scholar
- ORCID: 0009-0001-5742-0984
- Researcher ID: JGE-0053-2023
Papers
Vojtěch Lanz, and Pavel Pecina (2025): When Multilingual Models Compete with Monolingual Domain-Specific Models in Clinical Question Answering. In Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health), pages 69–82, Albuquerque, New Mexico. Association for Computational Linguistics. (url)
Vojtěch Lanz, and Jan Hajič jr. (2025): Gregorian melody, modality, and memory: Segmenting chant with Bayesian nonparametrics. In Proceedings of the 26th International Society for Music Information Retrieval Conference (ISMIR 2025), Daejeon, Korea.
Vojtěch Lanz, and Pavel Pecina (2025): CUNI-a at ArchEHR-QA 2025: Do We Need Giant LLMs for Clinical QA? In Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks), pages 27–40, Vienna, Austria. Association for Computational Linguistics. (url)
Vojtěch Lanz, Kristýna Szabová, and Jan Hajič jr. (2025): Making computational study of Gregorian melody accessible with ChantLab. In Proceedings of the Music Encoding Conference 2025 (MEC 2025), London. (https://works.hcommons.org/records/z50gm-qf714)
Vojtěch Lanz, and Pavel Pecina (2024): Paragraph Retrieval for Enhanced Question Answering in Clinical Documents. In Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, pages 580–590, Bangkok, Thailand. Association for Computational Linguistics. (url)
Vojtěch Lanz, and Jan Hajič jr. (2023): Text boundaries do not provide a better segmentation of Gregorian antiphons. Proceedings of the 10th International Conference on Digital Libraries for Musicology (DLfM '23). Association for Computing Machinery, New York, NY, USA, 72–76. (url)
Theses
- Master Thesis: Unsupervised segmentation of Gregorian chant melodies for exploring chant modality
Supervisor: MgA. et Mgr. Jan Hajič, jr., Ph.D.- Bachelor Thesis: Automatic Chord Recognition in Audio Recording
Supervisor: prof. Ing. Zdeněk Žabokrtský, Ph.D.