Large Language Models

There are no elephants in this picture

Goals of the course:

  1. Explain how the models work
  2. Teach basic usage of the models
  3. Help students critically assess what you read about them
  4. Encourage thinking about the broader context of using the models

Syllabus from SIS:

  • Basics of neural networks for language modeling
  • Language model typology
  • Data acquisition and curation, downstream tasks
  • Training (self-supervised learning, reinforcement learning with human feedback)
  • Finetuning & Inference
  • Multilinguality and cross-lingual transfer
  • Large Language Model Applications (e.g., conversational systems, robotics, code generation)
  • Multimodality (CLIP, diffusion models)
  • Societal impacts
  • Interpretability

About

SIS code: NPFL140
Semester: summer
E-credits: 3
Examination: 0/2 C
Guarantors: Jindřich Helcl, Jindřich Libovický

Timespace Coordinates

The course is held on Thrusdays at 15:40 in S3. The first lecture took place on 22 February.

Lectures

1. Introductory notes and discussion on large language models Slides

2. The Transformer Architecture Lecture notes Slides

3. LLM Training Slides Recording

4. LLM Inference Slides Code Recording

5. Generating Weather Reports Assignment

6. Data and Evaluation Reading

License

Unless otherwise stated, teaching materials for this course are available under CC BY-SA 4.0.

1. Introductory notes and discussion on large language models

 Feb 22 Slides

Covered topics: aims of the course, passing requirements. We discussed what are (large) language models, what are they for, what are their benefits and downsides. We concluded with a rough analysis of ChatGPT performance in different languages.

2. The Transformer Architecture

 Mar 7 Lecture notes Slides

After the class, you should be able to:

  • Explain the building blocks of the Transformer architecture to a non-technical person
  • Describe the Transformer architecture using equations, especially the self-attention block
  • Implement the Transformer architecture (in PyTorch or another framework with automated differentiation)

Class outline:

Additional materials:

3. LLM Training

 Mar 14 Slides Recording

After the class, you should be able to:

  • Give a high-level description of how neural networks are trained
  • Read and understand a neural training library documentation
  • Explain the differences between various training techniques used in LLMs today

Class outline:

  • Rest of the discussion on Transformers, see above
  • General introduction into neural network & transformer model training, pretrained models, RLHF, DPO

Additional materials:

4. LLM Inference

 Mar 21 Slides Code Recording

After the class, you should be able to:

  • Give a high-level description of how a transformer predicts a probability distribution for the next token in the sequence
  • Select the appropriate decoding algorithm for your use-case and understand its parameters
  • Write a Python code snippet for generating text with an open language model using the transformers library

Class outline:

  • Discussion, LLM zoo
  • 3D visualization of transformer inference
  • Decoding algorithms - exact inference (MAP), greedy search, beam search, top-k, top-p, Mirostat, locally typical sampling
  • Hands-on demonstration of text generation with the transformers library
  • Bonus: non-autoregressive decoding, reverse-engineering decoding algorithms

Additional materials:

5. Generating Weather Reports

 Mar 28 Assignment

Assignment #1

After the class, you should be able to:

  • Write a basic Python code querying a LLM through an OpenAI-like API.
  • Set up a suitable prompt and parameters to get the expected output.
  • Describe what are the opportunities and limits of recent open LLMs.

Class outline:

  • Introduction
  • Working on the assignment

Additional materials:

6. Data and Evaluation

 Apr 4 Reading

After the class, you should be able to:

  • Look for a dataset for a specified NLP task and find one (given the task is reasonably common)
  • Roughly assess the usefulness of the dataset based on its statistics
  • Pick an evaluation method that suits the task
  • Have a sense of what a "reasonable" score in that task might look like

Class outline:

  • Data for language modeling
  • NLP tasks and data (introduction + team work)
  • Evaluation (introduction + team work)

Additional materials:

Active participation

There will be two or three tasks during the semester; we will work on them mainly during classes but they might turn into a (small) homework.

Reading assignments

You will be asked at least once to read a paper before the class.

Final written test

You need to take part in a final written test that will not be graded.