Readings | ÚFAL

NLP

Word Embeddings

Julien Tissier, Amaury Habrard, Christophe Gravier: Near-lossless Binarization of Word Embeddings. https://arxiv.org/abs/1803.09065
Johannes Bjerva, Isabelle Augenstein: From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings. https://arxiv.org/abs/1802.09375
Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou: Word Translation Without Parallel Data. https://arxiv.org/abs/1710.04087
Mikel Artetxe, Gorka Labaka, and Eneko Agirre: Learning bilingual word embeddings with (almost) no bilingual data. https://aclweb.org/anthology/P17-1042
Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov: Enriching Word Vectors with Subword Information https://arxiv.org/abs/1607.04606
John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu: Charagram: Embedding Words and Sentences via Character n-grams. https://arxiv.org/abs/1607.02789
Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai: Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. https://arxiv.org/abs/1607.06520
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean: Distributed Representations of Words and Phrases and their Compositionality. https://arxiv.org/abs/1310.4546
Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean: Efficient Estimation of Word Representations in Vector Space. https://arxiv.org/abs/1301.3781

POS Tagging

Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher: A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks. https://arxiv.org/abs/1611.01587
Barbara Plank, Anders Søgaard, Yoav Goldberg: Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss. https://arxiv.org/abs/1604.05529
Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W. Black, Isabel Trancoso: Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation. http://arxiv.org/abs/1508.02096

Parsing

Adam Fisch, Jiang Guo, Regina Barzilay: Working Hard or Hardly Working: Challenges of Integrating Typology into Neural Dependency Parsers. https://arxiv.org/abs/1909.09279
Yuxuan Wang, Wanxiang Che, Jiang Guo, Yijia Liu, Ting Liu: Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing. https://arxiv.org/abs/1909.06775
Michael Ringgaard, Rahul Gupta, Fernando C. N. Pereira: SLING: A framework for frame semantic parsing. https://arxiv.org/abs/1710.07032
Lingpeng Kong, Chris Alberti, Daniel Andor, Ivan Bogatyy, David Weiss: DRAGNN: A Transition-Based Framework for Dynamically Connected Neural Networks. https://arxiv.org/abs/1703.04474
Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, Wang Ling: Learning to Compose Words into Sentences with Reinforcement Learning. https://arxiv.org/abs/1611.09100
Timothy Dozat, Christopher D. Manning: Deep Biaffine Attention for Neural Dependency Parsing. https://arxiv.org/abs/1611.01734
Jan Chorowski, Michał Zapotoczny, Paweł Rychlikowski: Read, Tag, and Parse All at Once, or Fully-neural Dependency Parsing. https://arxiv.org/abs/1609.03441
Bernd Bohnet, Ryan McDonald, Emily Pitler and Ji Ma: Generalized Transition-based Dependency Parsing via Control Parameters. https://www.aclweb.org/anthology/P/P16/P16-1015.pdf
Yuan Zhang, David Weiss: Stack-propagation: Improved Representation Learning for Syntax. https://arxiv.org/abs/1603.06598
Eliyahu Kiperwasser, Yoav Goldberg: Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations. https://arxiv.org/abs/1603.04351
Miguel Ballesteros, Yoav Goldberg, Chris Dyer, Noah A. Smith: Training with Exploration Improves a Greedy Stack-LSTM Parser. https://arxiv.org/abs/1603.03793
Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, Noah A. Smith: Many Languages, One Parser. https://arxiv.org/abs/1602.01595
Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith: Transition-Based Dependency Parsing with Stack Long Short-Term Memory. https://arxiv.org/abs/1505.08075

Coreference

Ali Emami, Paul Trichelair, Adam Trischler, Kaheer Suleman, Hannes Schulz, Jackie Chi Kit Cheung: The Knowref Coreference Corpus: Removing Gender and Number Cues for Difficult Pronominal Anaphora Resolution. https://arxiv.org/abs/1811.01747

NER, NEL

Shuyan Zhou, Shruti Rijhwani, Graham Neubig: Towards Zero-resource Cross-lingual Entity Linking. https://arxiv.org/abs/1909.13180
David Wadden, Ulme Wennberg, Yi Luan, Hannaneh Hajishirzi: Entity, Relation, and Event Extraction with Contextualized Span Representations. https://arxiv.org/abs/1909.03546
Ilya Shnayderman, Liat Ein-Dor, Yosi Mass, Alon Halfon, Benjamin Sznajder, Artem Spector, Yoav Katz, Dafna Sheinwald, Ranit Aharonov, Noam Slonim: Fast End-to-End Wikification. https://arxiv.org/abs/1908.06785
Victor Sanh, Thomas Wolf, Sebastian Ruder: A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks. https://arxiv.org/abs/1811.06031
Gregor Wiedemann, Raghav Jindal, Chris Biemann: microNER: A Micro-Service for German Named Entity Recognition based on BiLSTM-CRF. https://arxiv.org/abs/1811.02902
Kai Hu, Zhijian Ou, Min Hu, Junlan Feng: Neural CRF transducers for sequence labeling. https://arxiv.org/abs/1811.01382
Jiateng Xie, Zhilin Yang, Graham Neubig, Noah A. Smith, Jaime Carbonell: Neural Cross-Lingual Named Entity Recognition with Minimal Resources. https://arxiv.org/abs/1808.09861
Jonathan Raiman, Olivier Raiman: DeepType: Multilingual Entity Linking by Neural Type System Evolution. https://arxiv.org/abs/1802.01021
Nikolaos Kolitsas, Octavian-Eugen Ganea, Thomas Hofmann: End-to-End Neural Entity Linking. https://www.aclweb.org/anthology/K18-1050.pdf
Zhilin Yang, Ruslan Salakhutdinov, William Cohen: Multi-Task Cross-Lingual Sequence Tagging from Scratch. https://arxiv.org/abs/1603.06270
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer: Neural Architectures for Named Entity Recognition. https://arxiv.org/abs/1603.01360

Knowledge Graphs

Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, Andrew McCallum: Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning. https://arxiv.org/abs/1711.05851
Xi Victoria Lin, Richard Socher, Caiming Xiong: Multi-Hop Knowledge Graph Reasoning with Reward Shaping. https://www.aclweb.org/anthology/D18-1362.pdf

Q&A

Dayiheng Liu, Yeyun Gong, Jie Fu, Yu Yan, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Nan Duan: RikiNet: Reading Wikipedia Pages for Natural Question Answering. https://arxiv.org/abs/2004.14560
Adam Roberts, Colin Raffel, Noam Shazeer: How Much Knowledge Can You Pack Into the Parameters of a Language Model?. https://arxiv.org/abs/2002.08910
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang: REALM: Retrieval-Augmented Language Model Pre-Training. https://arxiv.org/abs/2002.08909
Patrick Lewis, Barlas Oğuz, Ruty Rinott, Sebastian Riedel, Holger Schwenk: MLQA: Evaluating Cross-lingual Extractive Question Answering. https://arxiv.org/abs/1910.07475
Tsung-yuan Hsu, Chi-liang Liu, Hung-yi Lee: Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model. https://arxiv.org/abs/1909.09587
Lin Pan, Rishav Chakravarti, Anthony Ferritto, Michael Glass, Alfio Gliozzo, Salim Roukos, Radu Florian, Avirup Sil: Frustratingly Easy Natural Question Answering. https://arxiv.org/abs/1909.05286
Zhuosheng Zhang, Yuwei Wu, Junru Zhou, Sufeng Duan, Hai Zhao, Rui Wang: SG-Net: Syntax-Guided Machine Reading Comprehension. https://arxiv.org/abs/1908.05147
Chris Alberti, Kenton Lee, Michael Collins: A BERT Baseline for the Natural Questions. https://arxiv.org/abs/1901.08634

Contextualized Embeddings, BERT

Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Ali Khan, Yin Yang, Deming Chen, Marianne Winslett, Hassan Sajjad, Preslav Nakov: Compressing Large-Scale Transformer-Based Models: A Case Study on BERT. https://arxiv.org/abs/2002.11985
Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning: ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. https://openreview.net/pdf?id=r1xMH1BtvB
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. https://arxiv.org/abs/1909.11942
Matthew E. Peters, Mark Neumann, Robert L. Logan IV, Roy Schwartz, Vidur Joshi, Sameer Singh, Noah A. Smith: Knowledge Enhanced Contextual Word Representations. https://arxiv.org/abs/1909.04164
Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, Xiang Zhou: Semantics-aware BERT for Language Understanding. https://arxiv.org/abs/1909.02209
Kawin Ethayarajh: How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. https://arxiv.org/abs/1909.00512
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov: RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://arxiv.org/abs/1907.11692
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805

Cross-lingual Embeddings

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov: Unsupervised Cross-lingual Representation Learning at Scale. https://arxiv.org/abs/1911.02116
Shijie Wu, Mark Dredze: Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT. https://arxiv.org/abs/1904.09077
Mikel Artetxe, Holger Schwenk: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. https://arxiv.org/abs/1812.10464
Omer Levy, Anders Søgaard, Yoav Goldberg: Reconsidering Cross-lingual Word Embeddings. https://arxiv.org/abs/1608.05426

Transformers

Alessandro Raganato, Yves Scherrer, Jörg Tiedemann: Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation. https://arxiv.org/abs/2002.10260
Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya: Reformer: The Efficient Transformer. https://arxiv.org/abs/2001.04451
Stephen Merity: Single Headed Attention RNN: Stop Thinking With Your Head. https://arxiv.org/abs/1911.11423
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. https://arxiv.org/abs/1910.10683
Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit: Insertion Transformer: Flexible Sequence Generation via Insertion Operations. https://arxiv.org/abs/1902.03249
Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Łukasz Kaiser: Universal Transformers. https://arxiv.org/abs/1807.03819
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: Attention Is All You Need. https://arxiv.org/abs/1706.03762

NMT

Mark Collier, Joeran Beel: Memory-Augmented Neural Networks for Machine Translation. https://www.aclweb.org/anthology/W19-6617.pdf
Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer: Mask-Predict: Parallel Decoding of Conditional Masked Language Models. https://arxiv.org/abs/1904.09324
Julian Richard Medina, Jugal Kalita: Parallel Attention Mechanisms in Neural Machine Translation. https://arxiv.org/abs/1810.12427
Jiatao Gu, James Bradbury, Caiming Xiong, Victor O. K. Li, Richard Socher: Non-Autoregressive Neural Machine Translation. https://arxiv.org/abs/1711.02281
Guillaume Lample, Ludovic Denoyer, Marc'Aurelio Ranzato: Unsupervised Machine Translation Using Monolingual Corpora Only. https://arxiv.org/abs/1711.00043
Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho: Unsupervised Neural Machine Translation. https://arxiv.org/abs/1710.11041
Thanh-Le Ha, Jan Niehues, Alexander Waibel: Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder. https://arxiv.org/abs/1611.04798
Melvin Johnson et al.: Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. https://arxiv.org/abs/1611.04558
Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, Koray Kavukcuoglu: Neural Machine Translation in Linear Time. https://arxiv.org/abs/1610.10099
Jason Lee, Kyunghyun Cho, Thomas Hofmann: Fully Character-Level Neural Machine Translation without Explicit Segmentation. https://arxiv.org/abs/1610.03017
Yonghui Wu et al.: Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. https://arxiv.org/abs/1609.08144
Jie Zhou, Ying Cao, Xuguang Wang, Peng Li, Wei Xu: Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation. https://arxiv.org/abs/1606.04199
Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman Vural, Kyunghyun Cho: Zero-Resource Translation with Multi-Lingual Neural Machine Translation. https://arxiv.org/abs/1606.04164
Rico Sennrich, Barry Haddow, Alexandra Birch: Edinburgh Neural Machine Translation Systems for WMT 16. https://arxiv.org/abs/1606.02891
Orhan Firat, Kyunghyun Cho, Yoshua Bengio: Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism. https://arxiv.org/abs/1601.01073
Rico Sennrich, Barry Haddow, Alexandra Birch: Neural Machine Translation of Rare Words with Subword Units. https://arxiv.org/abs/1508.07909
Ilya Sutskever, Oriol Vinyals, Quoc V. Le: Sequence to Sequence Learning with Neural Networks. https://arxiv.org/abs/1409.3215
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio Neural Machine Translation by Jointly Learning to Align and Translate. https://arxiv.org/abs/1409.0473

LM

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei: Language Models are Few-Shot Learners. https://arxiv.org/abs/2005.14165 GPT-3 GPT3
Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher: CTRL: A Conditional Transformer Language Model for Controllable Generation. https://arxiv.org/abs/1909.05858
Julian Eisenschlos, Sebastian Ruder, Piotr Czapla, Marcin Kardas, Sylvain Gugger, Jeremy Howard: MultiFiT: Efficient Multi-lingual Language Model Fine-tuning. https://arxiv.org/abs/1909.04761
Alec Radford et al.: Language Models are Unsupervised Multitask Learners. https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
Guillaume Lample, Alexis Conneau: Cross-lingual Language Model Pretraining. https://arxiv.org/abs/1901.07291
Jeremy Howard, Sebastian Ruder: Universal Language Model Fine-tuning for Text Classification. https://arxiv.org/abs/1801.06146
Anirudh Goyal, Nan Rosemary Ke, Alex Lamb, R Devon Hjelm, Chris Pal, Joelle Pineau, Yoshua Bengio: ACtuAL: Actor-Critic Under Adversarial Learning. https://arxiv.org/abs/1711.04755
Gábor Melis, Chris Dyer, Phil Blunsom: On the State of the Art of Evaluation in Neural Language Models. https://arxiv.org/abs/1707.05589
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu: Exploring the Limits of Language Modeling. https://arxiv.org/abs/1602.02410

GEC

Eric Malmi, Sebastian Krause, Sascha Rothe, Daniil Mirylenka, Aliaksei Severyn: Encode, Tag, Realize: High-Precision Text Editing. https://arxiv.org/abs/1909.01187
Iroro Orife: Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text. https://arxiv.org/abs/1804.00832
Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, Andrew Y. Ng: Neural Language Correction with Character-Based Attention. https://arxiv.org/abs/1603.09727

Summarization

Sandeep Subramanian, Raymond Li, Jonathan Pilault, Christopher Pal: On Extractive and Abstractive Neural Document Summarization with Transformer Language Models. https://arxiv.org/abs/1909.03186
Abigail See, Peter J. Liu, Christopher D. Manning: Get To The Point: Summarization with Pointer-Generator Networks. https://arxiv.org/abs/1704.04368
Lei Xu, Ziyun Wang, Ayana, Zhiyuan Liu, Maosong Sun: Topic Sensitive Neural Headline Generation. https://arxiv.org/abs/1608.05777
Lu Wang, Wang Ling: Neural Network-Based Abstract Generation for Opinions and Arguments. https://arxiv.org/abs/1606.02785
Ayana, Shiqi Shen, Yu Zhao, Zhiyuan Liu, Maosong Sun: Neural Headline Generation with Sentence-wise Optimization. https://arxiv.org/abs/1604.01904
Ramesh Nallapati, Bowen Zhou, Mingbo Ma: Classify or Select: Neural Architectures for Extractive Document Summarization. https://arxiv.org/abs/1611.04244
Ramesh Nallapati, Feifei Zhai, Bowen Zhou: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents. https://arxiv.org/abs/1611.04230
Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, Bing Xiang: Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond. https://arxiv.org/abs/1602.06023

Paraphrasing

Zichao Li, Xin Jiang, Lifeng Shang, Hang Li: Paraphrase Generation with Deep Reinforcement Learning. https://arxiv.org/abs/1711.00279

NLG

Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Christopher Pal, Aaron Courville: Adversarial Generation of Natural Language. https://arxiv.org/abs/1705.10929
Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, Eric P. Xing: Toward Controlled Generation of Text. https://arxiv.org/abs/1703.00955

Speech Recognition

Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-yiin Chang, Kanishka Rao, Alexander Gruenstein: Streaming End-to-end Speech Recognition For Mobile Devices. https://arxiv.org/abs/1811.06621

Speech Synthesis

Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu: Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. https://arxiv.org/abs/1712.05884
Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis: Parallel WaveNet: Fast High-Fidelity Speech Synthesis. https://arxiv.org/abs/1711.10433
Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous: Tacotron: Towards End-to-End Speech Synthesis. https://arxiv.org/abs/1703.10135
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu: WaveNet: A Generative Model for Raw Audio. https://arxiv.org/abs/1609.03499

Differential Privacy

Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlingsson, Dawn Song: The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets. https://arxiv.org/abs/1802.08232
H. Brendan McMahan, Daniel Ramage, Kunal Talwar, Li Zhang: Learning Differentially Private Recurrent Language Models. https://arxiv.org/abs/1710.06963
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, Karn Seth: Practical Secure Aggregation for Federated Learning on User-Held Data. https://arxiv.org/abs/1611.04482
Reza Shokri, Marco Stronati, Congzheng Song, Vitaly Shmatikov: Membership Inference Attacks against Machine Learning Models. https://arxiv.org/abs/1610.05820
H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas: Communication-Efficient Learning of Deep Networks from Decentralized Data. https://arxiv.org/abs/1602.05629

Adversarial Text

Peter Henderson, Koustuv Sinha, Rosemary Nan Ke, Joelle Pineau: Adversarial Gain. https://arxiv.org/abs/1811.01302
Mohit Iyyer, John Wieting, Kevin Gimpel, Luke Zettlemoyer: Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. https://arxiv.org/abs/1804.06059
Ji Gao, Jack Lanchantin, Mary Lou Soffa, Yanjun Qi: Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers. https://arxiv.org/abs/1801.04354
Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou: HotFlip: White-Box Adversarial Examples for Text Classification. https://arxiv.org/abs/1712.06751
Yonatan Belinkov, Yonatan Bisk: Synthetic and Natural Noise Both Break Neural Machine Translation. https://arxiv.org/abs/1711.02173
Zhengli Zhao, Dheeru Dua, Sameer Singh: Generating Natural Adversarial Examples. https://arxiv.org/abs/1710.11342
Robin Jia, Percy Liang: Adversarial Examples for Evaluating Reading Comprehension Systems. https://arxiv.org/abs/1707.07328

Adversarial Speech

Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, Xiaofeng Wang, Carl A. Gunter: CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition. https://arxiv.org/abs/1801.08535
Nicholas Carlini, David Wagner: Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. https://arxiv.org/abs/1801.01944

Fake News

Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi: Defending Against Neural Fake News. https://arxiv.org/abs/1905.12616

Images

Image Classification

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam: Searching for MobileNetV3. https://arxiv.org/abs/1905.02244
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen: MobileNetV2: Inverted Residuals and Linear Bottlenecks. https://arxiv.org/abs/1801.04381
Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Enhua Wu: Squeeze-and-Excitation Networks. https://arxiv.org/abs/1709.01507
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He: Aggregated Residual Transformations for Deep Neural Networks. https://arxiv.org/abs/1611.05431
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: Identity Mappings in Deep Residual Networks. https://arxiv.org/abs/1603.05027
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. https://arxiv.org/abs/1602.07261
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: Deep Residual Learning for Image Recognition. https://arxiv.org/abs/1512.03385
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna: Rethinking the Inception Architecture for Computer Vision. https://arxiv.org/abs/1512.00567
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich: Going Deeper with Convolutions. https://arxiv.org/abs/1409.4842
Karen Simonyan, Andrew Zisserman: Very Deep Convolutional Networks for Large-Scale Image Recognition. https://arxiv.org/abs/1409.1556
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton: ImageNet Classification with Deep Convolutional Neural Networks. https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby: Big Transfer (BiT): General Visual Representation Learning. https://arxiv.org/abs/1912.11370 BiT

Object Detection and Image Segmentation

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko: End-to-End Object Detection with Transformers. https://arxiv.org/abs/2005.12872
Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask R-CNN. https://arxiv.org/abs/1703.06870
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie: Feature Pyramid Networks for Object Detection. https://arxiv.org/abs/1612.03144
Jonathan Huang et al.: Speed/accuracy trade-offs for modern convolutional object detectors. https://arxiv.org/abs/1611.10012
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. https://arxiv.org/abs/1506.01497
Ross Girshick: Fast R-CNN. https://arxiv.org/abs/1504.08083

Image Labeling

Martin Engilberge, Louis Chevallier, Patrick Pérez, Matthieu Cord: Finding beans in burgers: Deep semantic-visual embedding with localization. https://arxiv.org/abs/1804.01720
Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan: Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge. https://arxiv.org/abs/1609.06647

Image Data Augmentation

Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le: Adversarial Examples Improve Image Recognition. https://arxiv.org/abs/1911.09665
Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le: Self-training with Noisy Student improves ImageNet classification. https://arxiv.org/abs/1911.04252
Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, Quoc V. Le: RandAugment: Practical automated data augmentation with a reduced search space. https://arxiv.org/abs/1909.13719
Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le: AutoAugment: Learning Augmentation Policies from Data. https://arxiv.org/abs/1805.09501

Generative Adversarial Networks

Tero Karras, Samuli Laine, Timo Aila: A Style-Based Generator Architecture for Generative Adversarial Networks. https://arxiv.org/abs/1812.04948
Andrew Brock, Jeff Donahue, Karen Simonyan: Large Scale GAN Training for High Fidelity Natural Image Synthesis. https://arxiv.org/abs/1809.11096
Zhiming Zhou, Yuxuan Song, Lantao Yu, Hongwei Wang, Jiadong Liang, Weinan Zhang, Zhihua Zhang, Yong Yu: Understanding the Effectiveness of Lipschitz-Continuity in Generative Adversarial Nets. https://arxiv.org/abs/1807.00751
Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena: Self-Attention Generative Adversarial Networks. https://arxiv.org/abs/1805.08318
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida: Spectral Normalization for Generative Adversarial Networks. https://arxiv.org/abs/1802.05957
Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen: Progressive Growing of GANs for Improved Quality, Stability, and Variation. https://arxiv.org/abs/1710.10196
Martin Arjovsky, Soumith Chintala, Léon Bottou: Wasserstein GAN. https://arxiv.org/abs/1701.07875
Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-Johann Simon-Gabriel, Bernhard Schölkopf: AdaGAN: Boosting Generative Models. https://arxiv.org/abs/1701.02386
Jianwei Yang, Anitha Kannan, Dhruv Batra, Devi Parikh: LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation. https://openreview.net/pdf?id=HJ1kmv9xx
Leon Sixt, Benjamin Wild, Tim Landgraf: RenderGAN: Generating Realistic Labeled Data. https://arxiv.org/abs/1611.01331
Lantao Yu, Weinan Zhang, Jun Wang, Yong Yu: SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. https://arxiv.org/abs/1609.05473
Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Alex Lamb, Martin Arjovsky, Olivier Mastropietro, Aaron Courville: Adversarially Learned Inference. https://arxiv.org/abs/1606.00704
Alec Radford, Luke Metz, Soumith Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, https://arxiv.org/abs/1511.06434
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio: Generative Adversarial Networks. https://arxiv.org/abs/1406.2661

Image Generation

Shu-Yu Chen, Wanchao Su, Lin Gao, Shihong Xia, Hongbo Fu: Deep Generation of Face Images from Sketches. https://arxiv.org/abs/2006.01047 DeepFaceDrawing

Adversarial Images

Tom B. Brown, Dandelion Mané, Aurko Roy, Martín Abadi, Justin Gilmer: Adversarial Patch. https://arxiv.org/abs/1712.09665
Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy: Explaining and Harnessing Adversarial Examples. https://arxiv.org/abs/1412.6572
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus: Intriguing properties of neural networks. https://arxiv.org/abs/1312.6199

OCR

Yuntian Deng, Anssi Kanervisto, Alexander M. Rush: What You Get Is What You See: A Visual Markup Decompiler. https://arxiv.org/abs/1609.04938

Image Enhancement

Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky: Deep Image Prior. https://arxiv.org/abs/1711.10925
Ryan Dahl, Mohammad Norouzi, Jonathon Shlens: Pixel Recursive Super Resolution. https://arxiv.org/abs/1702.00783
Christian Ledig et al.: Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. https://arxiv.org/abs/1609.04802
Richard Zhang, Phillip Isola, Alexei A. Efros: Colorful Image Colorization. https://arxiv.org/abs/1603.08511
Justin Johnson, Alexandre Alahi, Li Fei-Fei: Perceptual Losses for Real-Time Style Transfer and Super-Resolution. https://arxiv.org/abs/1603.08155

3D Objects

Jiajun Wu, Yifan Wang, Tianfan Xue, Xingyuan Sun, William T Freeman, Joshua B Tenenbaum: MarrNet: 3D Shape Reconstruction via 2.5D Sketches. https://arxiv.org/abs/1711.03129

Deep Learning

Optimization

Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han: On the Variance of the Adaptive Learning Rate and Beyond. https://arxiv.org/abs/1908.03265
Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh: Large Batch Optimization for Deep Learning: Training BERT in 76 minutes. https://arxiv.org/abs/1904.00962
Michael R. Zhang, James Lucas, Geoffrey Hinton, Jimmy Ba: Lookahead Optimizer: k steps forward, 1 step back. https://arxiv.org/abs/1907.08610
Sam McCandlish, Jared Kaplan, Dario Amodei, OpenAI Dota Team: An Empirical Model of Large-Batch Training. https://arxiv.org/abs/1812.06162
Yang You, Igor Gitman, Boris Ginsburg: Large Batch Training of Convolutional Networks. https://arxiv.org/abs/1708.03888
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He: Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. https://arxiv.org/abs/1706.02677
James Kirkpatrick et al.: Overcoming catastrophic forgetting in neural networks. https://arxiv.org/abs/1612.00796
James Martens, Roger Grosse: Optimizing Neural Networks with Kronecker-factored Approximate Curvature. https://arxiv.org/abs/1503.05671
Geoffrey Hinton, Oriol Vinyals, Jeff Dean: Distilling the Knowledge in a Neural Network. https://arxiv.org/abs/1503.02531
Diederik Kingma, Jimmy Ba: Adam: A Method for Stochastic Optimization. https://arxiv.org/abs/1412.6980

Activation Functions

Diganta Misra: Mish: A Self Regularized Non-Monotonic Neural Activation Function. https://arxiv.org/abs/1908.08681
Prajit Ramachandran, Barret Zoph, Quoc V. Le: Searching for Activation Functions. https://arxiv.org/abs/1710.05941
Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter: Self-Normalizing Neural Networks. https://arxiv.org/abs/1706.02515
Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio: Noisy Activation Functions. https://arxiv.org/abs/1603.00391
Xiaojie Jin, Chunyan Xu, Jiashi Feng, Yunchao Wei, Junjun Xiong, Shuicheng Yan: Deep Learning with S-shaped Rectified Linear Activation Units. https://arxiv.org/abs/1512.07030
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. https://arxiv.org/abs/1502.01852

Regularization

Takashi Ishida, Ikko Yamane, Tomoya Sakai, Gang Niu, Masashi Sugiyama: Do We Need Zero Training Loss After Achieving Zero Training Error?. https://arxiv.org/abs/2002.08709 Flooding
Deren Lei, Zichen Sun, Yijun Xiao, William Yang Wang: Implicit Regularization of Stochastic Gradient Descent in Natural Language Processing: Observations and Implications. https://arxiv.org/abs/1811.00659
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz: mixup: Beyond Empirical Risk Minimization. https://arxiv.org/abs/1710.09412
Sergey Ioffe: Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models. https://arxiv.org/abs/1702.03275
Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton: Layer Normalization. https://arxiv.org/abs/1607.06450
David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Hugo Larochelle, Aaron Courville, Chris Pal: Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations. https://arxiv.org/abs/1606.01305
Tim Cooijmans, Nicolas Ballas, César Laurent, Çağlar Gülçehre, Aaron Courville: Recurrent Batch Normalization. https://arxiv.org/abs/1603.09025
Stanislau Semeniuta, Aliaksei Severyn, Erhardt Barth: Recurrent Dropout without Memory Loss. https://arxiv.org/abs/1603.05118
Yarin Gal, Zoubin Ghahramani: A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. https://arxiv.org/abs/1512.05287
César Laurent, Gabriel Pereyra, Philémon Brakel, Ying Zhang, Yoshua Bengio: Batch Normalized Recurrent Neural Networks. https://arxiv.org/abs/1510.01378
Sergey Ioffe, Christian Szegedy: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. https://arxiv.org/abs/1502.03167
Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov: Improving neural networks by preventing co-adaptation of feature detectors. https://arxiv.org/abs/1207.0580

Generalization

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals: Understanding deep learning requires rethinking generalization. https://arxiv.org/abs/1611.03530

Architectures

Jason Liang, Elliot Meyerson, Risto Miikkulainen: Evolutionary Architecture Search For Deep Multitask Networks. https://arxiv.org/abs/1803.03745
Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V Le: Regularized Evolution for Image Classifier Architecture Search. https://arxiv.org/abs/1802.01548
Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy: Progressive Neural Architecture Search. https://arxiv.org/abs/1712.00559
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le: Neural Optimizer Search with Reinforcement Learning. https://arxiv.org/abs/1709.07417
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le: Learning Transferable Architectures for Scalable Image Recognition. https://arxiv.org/abs/1707.07012
Danijar Hafner, Alex Irpan, James Davidson, Nicolas Heess: Learning Hierarchical Information Flow with Recurrent Neural Modules. https://arxiv.org/abs/1706.05744
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. https://arxiv.org/abs/1701.06538
Bowen Baker, Otkrist Gupta, Nikhil Naik, Ramesh Raskar: Designing Neural Network Architectures using Reinforcement Learning. https://arxiv.org/abs/1611.02167
Barret Zoph, Quoc V. Le: Neural Architecture Search with Reinforcement Learning. https://arxiv.org/abs/1611.01578
Julian Georg Zilly, Rupesh Kumar Srivastava, Jan Koutník, Jürgen Schmidhuber: Recurrent Highway Networks. https://arxiv.org/abs/1607.03474
Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, Nando de Freitas: Learning to learn by gradient descent by gradient descent. https://arxiv.org/abs/1606.04474
Rupesh Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber: Training Very Deep Networks. https://arxiv.org/abs/1507.06228

Recurrent Cells

Gábor Melis, Tomáš Kočiský, Phil Blunsom: Mogrifier LSTM. https://arxiv.org/abs/1909.01792
Hao Peng, Roy Schwartz, Sam Thomson, Noah A. Smith: Rational Recurrences. https://arxiv.org/abs/1808.09357
Roy Schwartz, Sam Thomson, Noah A. Smith: SoPa: Bridging CNNs, RNNs, and Weighted Finite-State Machines. https://arxiv.org/abs/1805.06061
Tao Lei, Yu Zhang, Yoav Artzi: Training RNNs as Fast as CNNs. https://arxiv.org/abs/1709.02755
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. https://arxiv.org/abs/1406.1078

Model Interpretation

Hao Li, Zheng Xu, Gavin Taylor, Tom Goldstein: Visualizing the Loss Landscape of Neural Nets. https://arxiv.org/abs/1712.09913

Structured Prediction

Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio: An Actor-Critic Algorithm for Sequence Prediction. https://arxiv.org/abs/1607.07086
Sam Wiseman, Alexander M. Rush: Sequence-to-Sequence Learning as Beam-Search Optimization. https://arxiv.org/abs/1606.02960
Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, Michael Collins: Globally Normalized Transition-Based Neural Networks. https://arxiv.org/abs/1603.06042
Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, Yang Liu: Minimum Risk Training for Neural Machine Translation. https://arxiv.org/abs/1512.02433
Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba: Sequence Level Training with Recurrent Neural Networks. https://arxiv.org/abs/1511.06732

VAE

Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Eric P. Xing: On Unifying Deep Generative Models. https://arxiv.org/abs/1706.00550
Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio: Generating Sentences from a Continuous Space. https://arxiv.org/abs/1511.06349
Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio: A Recurrent Latent Variable Model for Sequential Data. https://arxiv.org/abs/1506.02216
Diederik P Kingma, Max Welling: Auto-Encoding Variational Bayes. https://arxiv.org/abs/1312.6114
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu: Neural Discrete Representation Learning. https://arxiv.org/abs/1711.00937 VQ-VAE
Ali Razavi, Aaron van den Oord, Oriol Vinyals: Generating Diverse High-Fidelity Images with VQ-VAE-2. https://arxiv.org/abs/1906.00446 VQ-VAE2
Arash Vahdat, Jan Kautz: NVAE: A Deep Hierarchical Variational Autoencoder. https://arxiv.org/abs/2007.03898 NVAE
Patrick Esser, Robin Rombach, Björn Ommer: Taming Transformers for High-Resolution Image Synthesis. https://arxiv.org/abs/2012.09841 VQGAN
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever: Zero-Shot Text-to-Image Generation. https://arxiv.org/abs/2102.12092 dVAE DALL-E

Double Descent

Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever: Deep Double Descent: Where Bigger Models and More Data Hurt. https://arxiv.org/abs/1912.02292
Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal: Reconciling modern machine learning practice and the bias-variance trade-off. https://arxiv.org/abs/1812.11118
Hartmut Maennel, Olivier Bousquet, Sylvain Gelly: Gradient Descent Quantizes ReLU Network Features. https://arxiv.org/abs/1803.08367

Neural ODEs

Yulia Rubanova, Ricky T. Q. Chen, David Duvenaud: Latent ODEs for Irregularly-Sampled Time Series. https://arxiv.org/abs/1907.03907
Emilien Dupont, Arnaud Doucet, Yee Whye Teh: Augmented Neural ODEs. https://arxiv.org/abs/1904.01681
Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud: Neural Ordinary Differential Equations. https://arxiv.org/abs/1806.07366

Evaluation

David Balduzzi, Karl Tuyls, Julien Perolat, Thore Graepel: Re-evaluating Evaluation. https://arxiv.org/abs/1806.02643

Artificial Intelligence

RL

Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, John Schulman: Quantifying Generalization in Reinforcement Learning. https://arxiv.org/abs/1812.02341
Charles Packer, Katelyn Gao, Jernej Kos, Philipp Krähenbühl, Vladlen Koltun, Dawn Song: Assessing Generalization in Deep Reinforcement Learning. https://arxiv.org/abs/1810.12282
Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros: Large-Scale Study of Curiosity-Driven Learning. https://arxiv.org/abs/1808.04355
Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, Sepp Hochreiter: RUDDER: Return Decomposition for Delayed Rewards. https://arxiv.org/abs/1806.07857
Daniel J. Mankowitz, Augustin Žídek, André Barreto, Dan Horgan, Matteo Hessel, John Quan, Junhyuk Oh, Hado van Hasselt, David Silver, Tom Schaul: Unicorn: Continual Learning with a Universal, Off-policy Agent. https://arxiv.org/abs/1802.08294
Kamil Ciosek, Shimon Whiteson: Expected Policy Gradients for Reinforcement Learning. https://arxiv.org/abs/1801.03326
Nan Ding, Radu Soricut: Cold-Start Reinforcement Learning with Softmax Policy Gradient. https://arxiv.org/abs/1709.09346
Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel: Overcoming Exploration in Reinforcement Learning with Demonstrations. https://arxiv.org/abs/1709.10089
Danijar Hafner, James Davidson, Vincent Vanhoucke: TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow. https://arxiv.org/abs/1709.02878
Jacob Andreas, Dan Klein, Sergey Levine: Natural Language Policy Search. https://drive.google.com/file/d/16SS8sfHPX5rgcRFoCjMDy57-heovYK-W/view
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov: Proximal Policy Optimization Algorithms. https://arxiv.org/abs/1707.06347
Junhyuk Oh, Satinder Singh, Honglak Lee: Value Prediction Network. https://arxiv.org/abs/1707.03497
Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans: Bridging the Gap Between Value and Policy Based Reinforcement Learning. https://arxiv.org/abs/1702.08892
Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine: Generalizing Skills with Semi-Supervised Reinforcement Learning. https://arxiv.org/abs/1612.00429
Frank S. He, Yang Liu, Alexander G. Schwing, Jian Peng: Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening. https://arxiv.org/abs/1611.01606
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel: Trust Region Policy Optimization. https://arxiv.org/abs/1502.05477
Sham Kakade: A Natural Policy Gradient. https://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf

RL – Actor Critic

Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat, Karl Tuyls, Remi Munos, Michael Bowling: Actor-Critic Policy Optimization in Partially Observable Multiagent Environments. https://arxiv.org/abs/1810.09026
Matteo Hessel, Hubert Soyer, Lasse Espeholt, Wojciech Czarnecki, Simon Schmitt, Hado van Hasselt: Multi-task Deep Reinforcement Learning with PopArt. https://arxiv.org/abs/1809.04474
Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. https://arxiv.org/abs/1802.01561
Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu: Population Based Training of Neural Networks. https://arxiv.org/abs/1711.09846
Alfredo V. Clemente, Humberto N. Castejón, Arjun Chandra: Efficient Parallel Methods for Deep Reinforcement Learning. https://arxiv.org/abs/1705.04862
Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu: Asynchronous Methods for Deep Reinforcement Learning. https://arxiv.org/abs/1602.01783 A3C

RL – DQN

Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell: Agent57: Outperforming the Atari Human Benchmark. https://arxiv.org/abs/2003.13350
Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martín Arjovsky, Alexander Pritzel, Andew Bolt, Charles Blundell: Never Give Up: Learning Directed Exploration Strategies. https://arxiv.org/abs/2002.06038 NGU
Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, Will Dabney: Recurrent Experience Replay in Distributed Reinforcement Learning. https://openreview.net/pdf?id=r1lyTjAqYX R2D2
Jianqing Fan, Zhaoran Wang, Yuchen Xie, Zhuoran Yang: A Theoretical Analysis of Deep Q-Learning. https://arxiv.org/abs/1901.00137
Jesse Farebrother, Marlos C. Machado, Michael Bowling: Generalization and Regularization in DQN. https://arxiv.org/abs/1810.00123
Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver: Rainbow: Combining Improvements in Deep Reinforcement Learning. https://arxiv.org/abs/1710.02298
Marc G. Bellemare, Will Dabney, Rémi Munos: A Distributional Perspective on Reinforcement Learning. https://arxiv.org/abs/1707.06887
Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg: Noisy Networks for Exploration. https://arxiv.org/abs/1706.10295
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas: Dueling Network Architectures for Deep Reinforcement Learning. https://arxiv.org/abs/1511.06581
Tom Schaul, John Quan, Ioannis Antonoglou, David Silver: Prioritized Experience Replay. https://arxiv.org/abs/1511.05952
Hado van Hasselt, Arthur Guez, David Silver: Deep Reinforcement Learning with Double Q-learning. https://arxiv.org/abs/1509.06461
Volodymyr Mnih et al.: Human-level control through deep reinforcement learning. http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf

Continuous RL

Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley: Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions. https://arxiv.org/abs/1901.01753
Scott Fujimoto, Herke van Hoof, David Meger: Addressing Function Approximation Error in Actor-Critic Methods. https://arxiv.org/abs/1802.09477
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. https://arxiv.org/abs/1801.01290
Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andrew J. Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, Raia Hadsell: Learning to Navigate in Complex Environments. https://arxiv.org/abs/1611.03673
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel: Benchmarking Deep Reinforcement Learning for Continuous Control. https://arxiv.org/abs/1604.06778
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra: Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller: Deterministic policy gradient algorithms. http://jmlr.org/proceedings/papers/v32/silver14.pdf

Model-based RL

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. https://arxiv.org/abs/1911.08265 MuZero
Oriol Vinyals et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. https://www.nature.com/articles/s41586-019-1724-z
Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski: Model-Based Reinforcement Learning for Atari. https://arxiv.org/abs/1903.00374
David Silver et al.: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. https://science.sciencemag.org/content/362/6419/1140
David Silver et al.: Mastering the game of Go without human knowledge. https://www.nature.com/articles/nature24270
David Silver et al.: Mastering the game of Go with deep neural networks and tree search. https://www.nature.com/articles/nature16961
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson: Learning Latent Dynamics for Planning from Pixels. https://arxiv.org/abs/1811.04551 PlaNet
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi: Dream to Control: Learning Behaviors by Latent Imagination. https://arxiv.org/abs/1912.01603 Dreamer
Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba: Mastering Atari with Discrete World Models. https://arxiv.org/abs/2010.02193 DreamerV2

Multi-agent RL

Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch: Emergent Complexity via Multi-Agent Competition. https://arxiv.org/abs/1710.03748

AutoML, AutoRL

Mingxing Tan, Ruoming Pang, Quoc V. Le: EfficientDet: Scalable and Efficient Object Detection. https://arxiv.org/abs/1911.09070
Mingxing Tan, Quoc V. Le: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. https://arxiv.org/abs/1905.11946
Anthony Francis, Aleksandra Faust, Hao-Tien Lewis Chiang, Jasmine Hsu, J. Chase Kew, Marek Fiser, Tsang-Wei Edward Lee: Long-Range Indoor Navigation with PRM-RL. https://arxiv.org/abs/1902.09458
Hao-Tien Lewis Chiang, Aleksandra Faust, Marek Fiser, Anthony Francis: Learning Navigation Behaviors End-to-End with AutoRL. https://arxiv.org/abs/1809.10124
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le: MnasNet: Platform-Aware Neural Architecture Search for Mobile. https://arxiv.org/abs/1807.11626
Hanxiao Liu, Karen Simonyan, Yiming Yang: DARTS: Differentiable Architecture Search. https://arxiv.org/abs/1806.09055
Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam: NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications. https://arxiv.org/abs/1804.03230
Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean: Efficient Neural Architecture Search via Parameter Sharing. https://arxiv.org/abs/1802.03268
Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V Le: Regularized Evolution for Image Classifier Architecture Search. https://arxiv.org/abs/1802.01548
Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy: Progressive Neural Architecture Search. https://arxiv.org/abs/1712.00559
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le: Learning Transferable Architectures for Scalable Image Recognition. https://arxiv.org/abs/1707.07012
Natasha Jaques, Shixiang Gu, Richard E. Turner, Douglas Eck: Tuning Recurrent Neural Networks with Reinforcement Learning. https://arxiv.org/abs/1611.02796

Meta Learning

Alex Nichol, Joshua Achiam, John Schulman: On First-Order Meta-Learning Algorithms. https://arxiv.org/abs/1803.02999
Chelsea Finn, Pieter Abbeel, Sergey Levine: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. https://arxiv.org/abs/1703.03400
Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, Timothy Lillicrap: One-shot Learning with Memory-Augmented Neural Networks. https://arxiv.org/abs/1605.06065

Discrete Latent Variables

Binxin Ru, Ahsan S. Alvi, Vu Nguyen, Michael A. Osborne, Stephen J Roberts: Bayesian Optimisation over Multiple Continuous and Categorical Inputs. https://arxiv.org/abs/1906.08878
Łukasz Kaiser, Samy Bengio: Discrete Autoencoders for Sequence Models. https://arxiv.org/abs/1801.09797
George Tucker, Andriy Mnih, Chris J. Maddison, Dieterich Lawson, Jascha Sohl-Dickstein: REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models. https://arxiv.org/abs/1703.07370
Eric Jang, Shixiang Gu, Ben Poole: Categorical Reparameterization with Gumbel-Softmax. https://arxiv.org/abs/1611.01144
Chris J. Maddison, Andriy Mnih, Yee Whye Teh: The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. https://arxiv.org/abs/1611.00712
Eric Jang, Shixiang Gu, Ben Poole: Categorical Reparameterization with Gumbel-Softmax. https://arxiv.org/abs/1611.01144
Yoshua Bengio, Nicholas Léonard, Aaron Courville: Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. https://arxiv.org/abs/1308.3432

Explicit Memory

Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel: Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. https://arxiv.org/abs/1807.01281
Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap: Unsupervised Predictive Memory in a Goal-Directed Agent. https://arxiv.org/abs/1803.10760
Mevlana Gemici, Chia-Chun Hung, Adam Santoro, Greg Wayne, Shakir Mohamed, Danilo J. Rezende, David Amos, Timothy Lillicrap: Generative Temporal Models with Memory. https://arxiv.org/abs/1702.04649
Caglar Gulcehre, Sarath Chandar, Yoshua Bengio: Memory Augmented Neural Networks with Wormhole Connections. https://arxiv.org/abs/1701.08718
Alex Graves et al.: Hybrid computing using a neural network with dynamic external memory. https://www.gwern.net/docs/2016-graves.pdf
Alex Graves, Greg Wayne, Ivo Danihelka: Neural Turing Machines. https://arxiv.org/abs/1410.5401

Hyperparameter Optimization

Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Elliot Karro, D. Sculley: Google Vizier: A Service for Black-Box Optimization. https://research.google.com/pubs/archive/46180.pdf

Evolution

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, Ilya Sutskever: Evolution Strategies as a Scalable Alternative to Reinforcement Learning. https://arxiv.org/abs/1703.03864

2020

Jonathan Mallinson, Aliaksei Severyn, Eric Malmi, Guillermo Garrido: Felix: Flexible Text Editing Through Tagging and Insertion. https://arxiv.org/abs/2003.10687 FELIX
Joshua Ainslie, Santiago Ontanon, Chris Alberti, Vaclav Cvicek, Zachary Fisher, Philip Pham, Anirudh Ravula, Sumit Sanghai, Qifan Wang, Li Yang: ETC: Encoding Long and Structured Inputs in Transformers. https://arxiv.org/abs/2004.08483 ETC
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, Dilip Krishnan: Supervised Contrastive Learning. https://arxiv.org/abs/2004.11362
Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng: Synthesizer: Rethinking Self-Attention in Transformer Models. https://arxiv.org/abs/2005.00743
Pierre Foret, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur: Sharpness-Aware Minimization for Efficiently Improving Generalization. https://arxiv.org/abs/2010.01412 SAM
Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed: Big Bird: Transformers for Longer Sequences. https://arxiv.org/abs/2007.14062 BigBird
Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi: The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers. https://arxiv.org/abs/2010.08127
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://arxiv.org/abs/2010.11929 ViT
Oshin Agarwal, Heming Ge, Siamak Shakeri, Rami Al-Rfou: Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training. https://arxiv.org/abs/2010.12688
Thao Nguyen, Maithra Raghu, Simon Kornblith: Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth. https://arxiv.org/abs/2010.15327
Johan S. Obando-Ceron, Pablo Samuel Castro: Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research. https://arxiv.org/abs/2011.14826
Mohammad Babaeizadeh, Mohammad Taghi Saffar, Danijar Hafner, Harini Kannan, Chelsea Finn, Sergey Levine, Dumitru Erhan: Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning. https://arxiv.org/abs/2012.04603

2021

John D. Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Sergey Levine, Quoc V. Le, Honglak Lee, Aleksandra Faust: Evolving Reinforcement Learning Algorithms. https://arxiv.org/abs/2101.03958
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision. https://arxiv.org/abs/2102.05918 ALIGN
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever: Learning Transferable Visual Models From Natural Language Supervision. https://arxiv.org/abs/2103.00020 CLIP
Kalpesh Krishna, Aurko Roy, Mohit Iyyer: Hurdles to Progress in Long-form Question Answering. https://arxiv.org/abs/2103.06332
Andrea Banino, Jan Balaguer, Charles Blundell: PonderNet: Learning to Ponder. https://arxiv.org/abs/2107.05407 PonderNet

Misc

Books

Ian Goodfellow, Yoshua Bengio, Aaron Courville: Deep Learning. http://www.deeplearningbook.org/

Blogs

Andrej Karpathy – Unreasonable Effectiveness of RNNs. http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Andrej Karpathy — Deep Reinforcement Learning. http://karpathy.github.io/2016/05/31/rl/
How to Use t-SNE Effectively. http://distill.pub/2016/misread-tsne/

Search form

NLP

Word Embeddings

POS Tagging

Parsing

Coreference

NER, NEL

Knowledge Graphs

Q&A

Contextualized Embeddings, BERT

Cross-lingual Embeddings

Transformers

NMT

LM

GEC

Summarization

Paraphrasing

NLG

Speech Recognition

Speech Synthesis

Differential Privacy

Adversarial Text

Adversarial Speech

Fake News

Images

Image Classification

Object Detection and Image Segmentation

Image Labeling

Image Data Augmentation

Generative Adversarial Networks

Image Generation

Adversarial Images

OCR

Image Enhancement

3D Objects

Deep Learning

Optimization

Activation Functions

Regularization

Generalization

Architectures

Recurrent Cells

Model Interpretation

Structured Prediction

VAE

Double Descent

Neural ODEs

Evaluation

Artificial Intelligence

RL

RL – Actor Critic

RL – DQN

Continuous RL

Model-based RL

Multi-agent RL

AutoML, AutoRL

Meta Learning

Discrete Latent Variables

Explicit Memory

Hyperparameter Optimization

Evolution

2020

2021

Misc

Books

Blogs