When considering neural network training, do vanishing gradients typically happen near the input layers or the output layers?
Interview question asked to Machine Learning Engineers interviewing at Bumble, Snap, Whatsapp and others: When considering neural network training, do vanishing gradients typically happen near the input layers or the output layers?.