Monday, 13 October, 2025 - 14:00
Room: 

Representational Capabilities of Transformers and Recurrent Architectures

Transformers have been the dominant architecture for building LLMs, but recently, there has been renewed interest in recurrent or state-space architectures to circumvent the quadratic complexity of inference for Transformers. Transformers process tokens in parallel using attention mechanisms without maintaining a persistent hidden state, whereas recurrent models store information in a fixed-size hidden state updated sequentially during the course of the computation. In this talk, we will explore the differences in the computational capabilities of these two classes of architectures and how our understanding of their strengths and limitations has evolved over the years. Specifically, we will discuss formal language and algorithmic reasoning problems that are provably easier for one architecture to express but difficult for the other. We will demonstrate how tools from theoretical computer science, such as communication complexity and formal language theory, can be adopted to prove limitations of neural sequence models. We will also present the differences in their empirical ability to learn various algorithmic tasks and discuss how certain synthetic tasks relate to real-world performance. The talk will primarily cover results from [1] while also discussing relevant findings from [2, 3] and related works.

[1] Separations in the Representational Capabilities of Transformers and Recurrent Architectures. Satwik Bhattamishra, Michael Hahn, Phil Blunsom, Varun Kanade. NeurIPS 2024
[2] Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions. Satwik Bhattamishra, Arkil Patel, Phil Blunsom, Varun Kanade. ICLR 2024
[3] On the Ability and Limitations of Transformers to Recognize Formal Languages. Satwik Bhattamishra, Kabir Ahuja, Navin Goyal. EMNLP 2020

 

*** The talk will be delivered in person (MFF UK, Malostranské nám. 25, 4th floor, room S1) and will be streamed via Zoom. For details how to join the Zoom meeting, please write to sevcikova et ufal.mff.cuni.cz ***

 

CV: 

Satwik Bhattamishra is a fourth-year PhD student at the University of Oxford, where his PhD is supported by Google DeepMind. His research primarily focuses on understanding the representational and algorithmic learning capabilities of neural sequence models such as Transformers and RNNs. Prior to joining Oxford, he spent two years at Microsoft Research India as a Research Fellow, and more recently, he worked on improving various aspects of LLMs at Google and Cohere.