Sushant Kumar

Grouped Query Attention

Grouped Query Attention is a new attention mechanism that can be used to improve the performance of transformer models. It is based on the idea of grouping queries in the self-attention mechanism of the transformer model. This allows the model to focus on different parts of the input sequence simultaneously, which can help improve the performance of the model on tasks that require long-range dependencies.

Figure 1: Comparison of Grouped Query Attention, Multi-Query Attention, and Multi-Head Attention. Image Source: arXiv:2305.13245


