Attention is All You Need

We chose this paper since it kicked off a revolution in NLP architectures, introducing attention-based transformers. While the paper provided a good technical approach, it didn’t have the best visualizations to help us understand how transformers worked. We looked at The Illustrated Transformer.

Error! Looks like your browser can't display PDFs. Try using an updated version of Chrome or Firefox.