QData Deep Learning Undergraduate Reading Group

About Our Group

Posts

Tags

art

attention

backprop

cnns

cv

distillation

explainability

gans

generative-models

iclr

lottery-tickets

lstms

mnist

neural-networks

nlp

object-detection

optimization

rnns

softmax

statistics

style-transfer

theory

transformers

universal-approximation-theorem

Attention is All You Need

https://arxiv.org/abs/1706.03762

by Eli Lifland
23 Feb 2020

We chose this paper since it kicked off a revolution in NLP architectures, introducing attention-based transformers. While the paper provided a good technical approach, it didn’t have the best visualizations to help us understand how transformers worked. We looked at The Illustrated Transformer.

[link to notes]

nlp attention transformers