The Illustrated Transformer
Discussions: Hacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments) Translations: Chinese (Simplified), Japanese, Korean, Russian Watch: MIT's Deep Learning State of the Art lecture referencing this post In the previous post, we looked at Attention - a ubiquitous method in modern deep learning models.