This week, we read Training Deep Neural Networks, Chapter 5 of the textbook. This chapter outlines some of the fundamental problems in training deep neural networks, like the vanishing and exploding gradient problems. This chapter gave us a good intuition why training all layers at the same speed is difficult, and why deep learning is an engineering science.
We also read A Recipe for Training Neural Networks, Andrej Karpathy’s blog post. This gives some practical advice for getting deep neural networks to train well: to converge quickly to good solutions.