towhee.models.multiscale_vision_transformers.mvit¶

Functions

Classes

`AttentionPool`	A MLP block that contains two linear layers with a normalization layer. the MLP block is used in a transformer model after the attention block. :: Input ↓ Reshape ↓ Pool ↓ Reshape ↓ norm :param thw_shape: the shape of the input tensor (before flattening). :type thw_shape: List :param pool: Pool operation that is applied to the input tensor. If pool is None, return the input tensor. :type pool: Callable :param has_cls_embed: whether the input tensor contains cls token. Pool operation excludes cls token. :type has_cls_embed: bool :param norm: Optional normalization operation applied to tensor after pool. :type norm: Callable.
`DropPath`	Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
`MViT`	Multiscale Vision Transformers Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer https://arxiv.org/abs/2104.11227
`Mlp`	Multi-layer perception module.
`MultiScaleAttention`
`MultiScaleBlock`	MultiScaleBlock for mvit
`PatchEmbed`	PatchEmbed.
`Permute`
`TransformerBasicHead`	BasicHead.