towhee.models.layers.multi_scale_attention

Classes

MultiScaleAttention

A multiscale attention block. compare to a conventional attention block, a multiscale attention block optionally supports pooling (either before or after qkv projection). If pooling is not used, a multiscale attention block is equivalent to a conventional attention block. :: Input | |----------------|-----------------| ↓ ↓ ↓ Linear Linear Linear & & & Pool (Q) Pool (K) Pool (V) → -------------- ← | ↓ | MatMul & Scale | ↓ | Softmax | → ----------------------- ← ↓ MatMul & Scale ↓ DropOut :param dim: Input feature dimension. :type dim: int :param num_heads: number of heads in the attention layer. :type num_heads: int :param qkv_bias: If set to False, the qkv layer will not learn an additive bias. :type qkv_bias: bool :param dropout_rate: Dropout rate. :type dropout_rate: float :param kernel_q: Pooling kernel size for q. If both pooling kernel size and pooling stride size are 1 for all the dimensions, pooling is disabled. :type kernel_q: _size_3_t :param kernel_kv: Pooling kernel size for kv. If both pooling kernel size and pooling stride size are 1 for all the dimensions, pooling is disabled. :type kernel_kv: _size_3_t :param stride_q: Pooling kernel stride for q. :type stride_q: _size_3_t :param stride_kv: Pooling kernel stride for kv. :type stride_kv: _size_3_t :param norm_layer: normalization layer used after pooling. :type norm_layer: nn.Module :param has_cls_embed: If set to True, the first token of the input tensor should be a cls token. Otherwise, the input tensor does not contain a cls token. Pooling is not applied to the cls token. :type has_cls_embed: bool :param pool_mode: Pooling mode. Option includes "conv" (learned pooling), "avg" (average pooling), and "max" (max pooling). :type pool_mode: str :param pool_first: If set to True, pool is applied before qkv projection. Otherwise, pool is applied after qkv projection. :type pool_first: bool.