What are your strategies to tackle the vanishing or exploding gradient problem in deep learning?

Question

Anonymous · Accepted Answer

Vanishing and exploding gradients usually occur in sequential models where the
weights are share between input values. As the weights are shared, if the value
is greater than 1 that means it will keep multiply with itself until the end of
the sequence which causes exploding gradients or inf or nan values. On the other
hand, when value is less than 1, the wieght can go to 0 which is vanishing
gradients. We can use a few strategies to combat that. For example, we use a
gated architecture in gru and lstm to refresh the sequence weights when the
words are not related to each other. Then, we can use gradient clipping which
decides the max and min value and if value goes above that, it clips the
gradient. Then, we can normalize the gradient values to keep the values in a
certain range, let's say 0 and 1. Then, we can use some kind of layer, batch or
instance normalization to normalize the values.

Anonymous · Answer

Gradient clipping to help deal with exploding gradients, residual connections to
deal with vanishing gradients

What are your strategies to tackle the vanishing or exploding gradient problem in deep learning?

Did you come across this question in an interview?

Answers

Also asked as

Try Our AI Interviewer