The following topics are directly relevant to the SPRINT and PONK projects and suitable as a student thesis in Computational Linguistics, Natural Language Processing (NLP), or a related program. Each topic produces artifacts (models, datasets, evaluation results) that would be integrated into the running system.
Fine-Tuning Czech Language Models for Legal Rule Detection
Summary
Train a local classifier to detect specific stylistic/linguistic rule violations in Czech legal text, replacing or complementing the current LLM- based approach.
Motivation
The current system sends each text unit to a general-purpose LLM (e.g., Llama 3.3, latest GPT models) with a detailed prompt per rule. This is slow (~seconds per rule × unit), non-deterministic, and expensive. A fine-tuned local model could provide faster, consistent, and more precise detection.
Approach
- Frame each rule as a binary classification task (violation / no violation) or a multi-label task across all rules
- Fine-tune a Czech encoder model (e.g., RobeCzech, Czert, or a multilin- gual model like XLM-RoBERTa) on annotated examples
- Create training data from: (a) existing manual annotations, (b) LLM- generated synthetic examples, (c) rule-based heuristics
- Evaluate precision, recall, F1 per rule; compare with the LLM baseline
- Investigate few-shot and data augmentation strategies for rules with sparse examples
Expected outcomes
A fine-tuned model (or ensemble) deployable as a local service; a benchmark dataset; a comparative analysis of local vs. LLM-based detection.
Automatic Discovery of Stylistic Rules from Legal Corpora
Summary
Use unsupervised or semi-supervised NLP methods to discover new candidate stylistic rules from large collections of Czech legal text.
Motivation
The current rule set was defined manually by legal linguists. Additional rules already exist in related projects (e.g., PONK) and will be integrated as the application matures, but there may be many more recurring stylistic issues that could be systematically identified and proposed as new rules.
Approach
- Work with existing corpora of Czech legal/administrative texts available within the project
- Use anomaly detection, clustering, or contrastive analysis (legal text vs. standard Czech) to identify recurring unusual constructions
- Apply dependency parsing and morphological analysis (via UD- Pipe/MorphoDiTa) to extract syntactic patterns
- Rank candidate rules by frequency, severity, and distinctiveness
- Validate with domain experts (legal linguists)
- Compare discovered candidates with rules already formalized in PONK and SPRINT
Expected outcomes
A pipeline for rule discovery; a ranked list of candidate rules with examples; analysis of Czech legal writing patterns.
Integrating PONK with LLM-Based Detection
Summary
Design and evaluate a hybrid detection architecture that combines the existing classical rule-based NLP system (PONK) with LLM-based and fine-tuned model approaches, routing each rule to the optimal method.
Motivation
PONK is an existing rule-based NLP system that already im- plements many of the exact stylistic rules studied in SPRINT using classical methods (morphological analysis, dependency parsing). Some rules have de- terministic linguistic signatures well-suited to PONK, while others require the semantic understanding of an LLM. Understanding where each approach excels is key to building an optimal production system.
Approach
- Benchmark PONK’s existing rule implementations against the LLM-based evaluator (and optionally a fine-tuned classifier from Topic 1) on the same test set
- Evaluate per-rule: precision, recall, F1, latency, cost
- Analyze error patterns: where does the LLM succeed and PONK fails, and vice versa?
- Propose a hybrid architecture that routes each rule to the optimal method (PONK, fine-tuned model, or LLM)
- Implement a routing/orchestration layer and evaluate end-to-end performance
Expected outcomes
A comparative benchmark across methods; a hybrid detection architecture with rule-level routing; practical recommendations for Czech legal NLP.
Prompt Optimization and Structured Output for Legal Text Evaluation
Summary
Systematically evaluate and optimize prompt strategies for rule- based text evaluation using LLMs, with a focus on structured and reliable output.
Motivation
Prompt design significantly affects LLM accuracy, consistency, and output format compliance. The current system uses a fixed prompt template per rule. There is room to improve detection quality through better prompting without changing the model.
Approach
- Compare prompt strategies: zero-shot, few-shot, chain-of-thought, rule decomposition
- Evaluate the effect of example selection, ordering, and negative examples
- Investigate constrained decoding / structured generation (e.g., JSON mode, grammar-constrained sampling) for reliable output parsing
- Measure per-rule accuracy, false positive rate, and output format compli- ance across prompt variants
- Explore batching strategies (multiple sentences per prompt, multiple rules per prompt)
Expected outcomes
An optimized prompt library per rule; guidelines for prompt design in legal NLP; quantitative comparison of strategies.


