Attention Mechanism Theory
IntermediateNew
0 answeredShowing 8 of 11 available questions using your saved history. Retakes draw unseen questions first, then review or retry items, then repeated items only when the pool is small.
8 intermediateAdapts to your performance
Question 1 of 8
120sintermediate (4/10)conceptual
In the transformer self-attention mechanism, why are the attention scores divided by the square root of the key dimension before applying softmax?