Anonymous
Attention mechanisms work through three important aspect namely the query, key and value. These aspects help to calculate a similarity of the current word against different words in the text blob. In this query is the current word we are calculating similarity for, key are the other words in the text, value is the output from the softmax after the similarity is calculated. Value gives weight to other words as to how much effective it would be to transform the original vector of the word.