Large Language Models

There are no elephants in this picture

Goals of the course:

  1. Explain how the models work
  2. Teach basic usage of the models
  3. Help students critically assess what you read about them
  4. Encourage thinking about the broader context of using the models

Syllabus from SIS:

  • Basics of neural networks for language modeling
  • Language model typology
  • Data acquisition and curation, downstream tasks
  • Training (self-supervised learning, reinforcement learning with human feedback)
  • Finetuning & Inference
  • Multilinguality and cross-lingual transfer
  • Large Language Model Applications (e.g., conversational systems, robotics, code generation)
  • Multimodality (CLIP, diffusion models)
  • Societal impacts
  • Interpretability

About

SIS code: NPFL140
Semester: summer
E-credits: 3
Examination: 0/2 C
Guarantors: Jindřich Helcl, Jindřich Libovický

Timespace Coordinates

The course is held on Thrusdays at 15:40 in S3. The first lecture took place on 22 February.

Lectures

1. Introductory notes and discussion on large language models Slides

2. The Transformer Architecture Lecture notes Slides

3. LLM Training Slides Recording

4. LLM Inference Slides Code Recording

5. Generating Weather Reports Assignment

6. Data and Evaluation Lecture notes

7. Evaluation, Working with the Models MCQA Evaluation Speech Translation LLMs for Machine Translation Chain-of-thought Prompting; RAG Generation; Evaluation; Web navigation Experience with LLMs Recording

8. LLM Efficiency Assignment review Efficiency Recording

License

Unless otherwise stated, teaching materials for this course are available under CC BY-SA 4.0.

1. Introductory notes and discussion on large language models

 Feb 22 Slides

Covered topics: aims of the course, passing requirements. We discussed what are (large) language models, what are they for, what are their benefits and downsides. We concluded with a rough analysis of ChatGPT performance in different languages.

2. The Transformer Architecture

 Mar 7 Lecture notes Slides

After the class, you should be able to:

  • Explain the building blocks of the Transformer architecture to a non-technical person
  • Describe the Transformer architecture using equations, especially the self-attention block
  • Implement the Transformer architecture (in PyTorch or another framework with automated differentiation)

Class outline:

Additional materials:

3. LLM Training

 Mar 14 Slides Recording

After the class, you should be able to:

  • Give a high-level description of how neural networks are trained
  • Read and understand a neural training library documentation
  • Explain the differences between various training techniques used in LLMs today

Class outline:

  • Rest of the discussion on Transformers, see above
  • General introduction into neural network & transformer model training, pretrained models, RLHF, DPO

Additional materials:

4. LLM Inference

 Mar 21 Slides Code Recording

After the class, you should be able to:

  • Give a high-level description of how a transformer predicts a probability distribution for the next token in the sequence
  • Select the appropriate decoding algorithm for your use-case and understand its parameters
  • Write a Python code snippet for generating text with an open language model using the transformers library

Class outline:

  • Discussion, LLM zoo
  • 3D visualization of transformer inference
  • Decoding algorithms - exact inference (MAP), greedy search, beam search, top-k, top-p, Mirostat, locally typical sampling
  • Hands-on demonstration of text generation with the transformers library
  • Bonus: non-autoregressive decoding, reverse-engineering decoding algorithms

Additional materials:

5. Generating Weather Reports

 Mar 28 Assignment

Assignment #1

After the class, you should be able to:

  • Write a basic Python code querying a LLM through an OpenAI-like API.
  • Set up a suitable prompt and parameters to get the expected output.
  • Describe what are the opportunities and limits of recent open LLMs.

Class outline:

  • Introduction
  • Working on the assignment

Additional materials:

6. Data and Evaluation

 Apr 4 Lecture notes

After the class, you should be able to:

  • Look for a dataset for a specified NLP task and find one (given the task is reasonably common)
  • Roughly assess the usefulness of the dataset based on its statistics
  • Pick an evaluation method that suits the task
  • Have a sense of what a "reasonable" score in that task might look like

Class outline:

  • Data for language modeling
  • NLP tasks and data (introduction + team work)
  • Evaluation (introduction + team work)

Additional materials:

7. Evaluation, Working with the Models

 Apr 11 MCQA Evaluation Speech Translation LLMs for Machine Translation Chain-of-thought Prompting; RAG Generation; Evaluation; Web navigation Experience with LLMs Recording

Class outline:

  • Remarks on LLM evaluation on multiple-choice question answering task
  • Speech translation challenges
  • Using LLMs for machine translation
  • Chain-of-thought prompting, retrieval-augmented generation
  • Generation, evaluation and Web navigation using LLMs
  • Experience with using LLMs within the EDU-AI project, Task-oriented Dialogue

8. LLM Efficiency

 Apr 18 Assignment review Efficiency Recording

After the class, you should be able to:

  • Identify technical bottlenecks constraining inference and training with LLMs
  • Know methods enabling the usage LLMs under computational restrictions:
    • parameter efficient fine-tuning,
    • quantization,
    • picking the right model scale for your data.

Class outline:

  • Assignment 1 review
  • Time and space requirements of LLMs
  • Low-rank adaptation
  • Quantization
  • Scaling

Active participation

There will be two or three tasks during the semester; we will work on them mainly during classes but they might turn into a (small) homework.

Reading assignments

You will be asked at least once to read a paper before the class.

Final written test

You need to take part in a final written test that will not be graded.