Large Language Models

Goals of the course:

Explain how the models work
Teach basic usage of the models
Help students critically assess what you read about them
Encourage thinking about the broader context of using the models

Syllabus from SIS:

Basics of neural networks for language modeling
Language model typology
Data acquisition and curation, downstream tasks
Training (self-supervised learning, reinforcement learning with human feedback)
Finetuning & Inference
Multilinguality and cross-lingual transfer
Large Language Model Applications (e.g., conversational systems, robotics, code generation)
Multimodality (CLIP, diffusion models)
Societal impacts
Interpretability

The course is part of the inter-university programme prg.ai Minor.

About

SIS code: NPFL140
Semester: summer
E-credits: 3
Examination: 0/2 C
Guarantors: Jindřich Helcl, Jindřich Libovický

Timespace Coordinates

The course is held on Thursdays at 12:20 in S5.

Lectures

1. Introductory notes and discussion on large language models Slides

2. The Transformer architecture Slides Notes Recording

3. Data & Evaluation Slides

4. LLM Inference Slides Code Recording

5. Hands-on session Exercise

6. LLM Training Slides Recording

7. Tokenization Slides Recording

8. Multilinguality and Machine Translation Slides Recording

9. Mini-conference: Research for and with LLMs Recording

10. Test-time scaling: Reasoning, RAG, Agents Slides Recording (low res, sorry!)

11. Speech LLMs, Multimodality Slides Recording

12. LLMs and Language Understanding Recording Slides

License

Unless otherwise stated, teaching materials for this course are available under CC BY-SA 4.0.

1. Introductory notes and discussion on large language models

Feb 19 Slides

Instructor: Zdeněk Kasner

Covered topics: aims of the course, passing requirements. We informally discussed what are (large) language models, what are they for, what are their benefits and downsides. We also gathered ideas on how to train the models, how to use them, and how to evaluate them.

2. The Transformer architecture

Feb 26 Slides Notes Recording

Instructor: Jindřich Libovický

Learning objectives. After the lecture you should be able to...

Explain the building blocks of the Transformer architecture to a non-technical person.
Describe the Transformer architecture using equations, especially the self-attention block.
Implement the Transformer architecture (in PyTorch or another framework that does automated differentiation).

Additional materials.

Transformers explained by AI Coffee Break with Letitia (20 min)
Let's build GPT: from scratch, in code, spelled out by Andrej Karpathy (2 hours)
The Illustrated Transformer by Jay Alammar
MicroGPT Project: Complete Transformer training and inference code in 200 lines of Python

3. Data & Evaluation

Mar 5 Slides

Instructors: Ondřej Dušek & Patrícia Schmidtová

Learning objectives:

Understand what kinds of data LLMs need for training
Know about viable tasks for evaluating LLMs
Know where to find data for evaluation tasks
Have an idea on how to evaluate -- what approaches there are and what are their pros and cons

Additional materials:

A survey of datasets for LLMs
A blog post on evaluation -- a little different take than what we showed at the lecture, but very well written

4. LLM Inference

Mar 12 Slides Code Recording

Instructor: Zdeněk Kasner

Learning objectives. After the class you should be able to...

Understand how to generate text with a Transformer-based language model.
Explain differences between decoding algorithms and the role of decoding parameters.
Choose a suitable LLM for your task.
Run a LLM locally on your computer or computational cluster.

Additional materials.

AnimatedLLM, series of LLM visualizations
LLM Visualization, a 3D visualization of transformer inference
Deep Dive into LLMs like ChatGPT by Andrej Karpathy (3.5 hours)
HuggingFace Models, the repository of open (L)LMs
Awesome LLM, a curated list of resources for LLMs
How to generate text, the Huggingface decoding algorithms overview
Generation with LLMs, common pitfalls when generating text with LLMs

5. Hands-on session

Mar 19 Exercise Shared Report

Instructor: Zdeněk Kasner

Learning objectives. After the class, you should be able to...

Query open local models via an API.
Describe the influence of different decoding parameters on the nature of the outputs.
Choose an appropriate prompting technique for different models.

Additional materials.

6. LLM Training

Mar 26 Slides Recording

Instructor: Ondřej Dušek

Learning objectives:

Have a basic idea about training neural networks (gradient descent)
Understand the phases of LLM training (pretraining, instruction tuning, RLHF)
Distinguish between prompting and finetuning

7. Tokenization

Apr 2 Slides Recording

Instructor: Jindřich Libovický

Learning objectives. After the class, you should be able to...

Explain how the current LLM tokenizers (BPE and Unigram) are trained and used at inference time
When seeing a tokenized text, reason why it got tokenized the way it did
Discuss emerging alternatives to BPE and subword tokenization in general

Additional materials.

Andrej Karpathy: Let's build the GPT Tokenizer , 2.2 hour video
The Tokenizer Playground, a demo of the tokenizers of popular models running in a browswer
Huggingface blog: https://huggingface.co/spaces/huggingface/number-tokenization-blog

8. Multilinguality and Machine Translation

Apr 9 Slides Recording

Instructor: Jindřich Libovický

Learning objectives. After the class, you should be able to...

Name benefits of multilingual language models and cross-lingual transfer.
Pick the multilingual model suitable for a specific language based on training data, similar languages covered and tokenizer properties.
Use LLMs for machine translation including evaluation.

9. Mini-conference: Research for and with LLMs

Apr 16 Recording

Researchers from ÚFAL presented their research involving LLMs:

10. Test-time scaling: Reasoning, RAG, Agents

Apr 23 Slides Recording (low res, sorry!)

Instructor: Zdeněk Kasner

Learning objectives. After the lecture you should be able to...

Interpret LLM scaling laws and explain why simply scaling pretraining has limits.
Use chain-of-thought prompting to improve model performance on multi-step reasoning tasks.
Distinguish between different approaches to building large reasoning models (pure RL, SFT+RL, distillation) and their trade-offs.
Build a basic RAG pipeline: chunk documents, embed them, store in a vector database, and retrieve at query time.
Explain how tool calling and MCP work, and give examples of when to use them.
Design a simple LLM agent using the ReAct framework (thought → action → observation loop).

Additional materials.

A Visual Guide to Reasoning LLMs by Maarten Grootendorst, accessible overview of scaling laws, CoT, and reasoning models
The Illustrated DeepSeek-R1 by Jay Alammar, visual walkthrough of how DeepSeek-R1 is trained
Understanding Reasoning LLMs by Sebastian Raschka, concise summary of approaches to building reasoning models
A Visual Guide to LLM Agents by Maarten Grootendorst, covers RAG, tool calling, and the agent framework

11. Speech LLMs, Multimodality

Apr 30 Slides Recording

Instructor: Peter Polák, Dominik Macháček, Andrei Manea

Learning objectives. After the lecture you should be able to...

Define what multimodal LLMs are and explain both the user-centric and technical motivations for extending models to include audio, vision, and sign language.
Compare the challenges of non-text modalities versus text, particularly regarding the difficulties of data acquisition and the computational costs of sparse, large-size representations.
Compare different speech representations used in neural networks, weighing the pros and cons of traditional methods (like MFCCs) against modern direct approaches (like learned CNN feature encoders and raw audio).
Explain how speech is integrated into text LLMs using modality encoders and adapters within standard pre-training and post-training pipelines.
Describe the architecture of full-duplex conversational models (like Moshi), including how they use discrete audio tokens (via the Mimi codec and VQ-VAEs) and parallel streams to enable simultaneous listening and speaking.
Evaluate the trade-offs in speech translation approaches, distinguishing between cascaded (ASR + MT), direct Speech Foundation Models, and Speech LLMs, and identifying which systems excel in specific conditions like handling noise, long-form audio, or code-switching.

Additional materials.

Speech LLMs:
- Moshi Paper
- Delayed Streams Modeling
Vision in LLMs:
- Visual Transformer
- Contrastive Learning with CLIP and SigLip
- Vision Contextualization: Q-Former
- Additional way of visual modeling with Discrete Tokens
- Examples of Benchmarks: MARVL, CUS-QA

12. LLMs and Language Understanding

May 14 Recording Slides

Instructor: Tomáš Musil

Class outline:

Meaning, understanding, language: not singleton concepts
Why it may be impossible for LLMs to learn meaning and what suggests it migt be possible
Thought experiments: Chinese room, Blockhead, Octopus
Experiments show LLMs learning at least some extent of meaning from form only
Ethical questions surrounding the training and use of LLMs

Additional materials:

NPFL130 Filosofie jazyka a NLP

Project work

You will work on a team project during the semester. Teams of 4-6 students will work on the following topics.

Project Timeline

2 March: project assignment
Week of 16–20 March, Kick-off meetings with supervisor
31 March: Experiment plan submitted
21 April: Self-assessment form
Project presentations
- Early bird: 14 May
- Standard: 21 May
Project reports by the end of the semester (Hard deadline: 5 working days before you need the credit)

In addition, write a log every week from the start until the end (the earlier you submit the less logs to do).

Project Reports

Each team will submit a report, consisting of:

Brief method overview
Summary of related work
Experimental design
Results
Conclusions
Overview of individual contributions of team members
References

The length of the report should be maximum 4 pages plus references and contributions. You might want to use the ACL paper template.

Reading assignments

You will be asked at least once to read a paper before the class.

Final written test

You need to take part in a final written test that will not be graded.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Large Language Models

About

Timespace Coordinates

Lectures

License

1. Introductory notes and discussion on large language models

2. The Transformer architecture

3. Data & Evaluation

4. LLM Inference

5. Hands-on session

6. LLM Training

7. Tokenization

8. Multilinguality and Machine Translation

9. Mini-conference: Research for and with LLMs

10. Test-time scaling: Reasoning, RAG, Agents

11. Speech LLMs, Multimodality

12. LLMs and Language Understanding

Project work

Project Timeline

Project Reports

Reading assignments

Final written test