Anonymous
An attention model is an example of a hyper network where the weights of model are determined by the input itself. In the attention mechanism, this occurs in that each token of the input sequence is compared with all others in the context window to determine the next token. The Attention mechanism by default does not care about the order of the input, which is ironic because of the success it has found in next token prediction. This is the basis for the LLMs. The prompt (which may be modified on the backend) will be used and then the next token will be predicted for the answer, and then the answer is built up token by token.