Between ReLU and sigmoid functions, which one mitigates the vanishing gradient issue more efficiently?

Free for the first 3 answer contributors



2 months ago

Interview question asked to Data Scientists and Machine Learning Engineers interviewing at Fiverr, Asana, Microsoft and others: Between ReLU and sigmoid functions, which one mitigates the vanishing gradient issue more efficiently?.