towhee.models.multiscale_vision_transformers.mvit¶
Functions
Stochastic Depth per sample. |
|
Classes
A MLP block that contains two linear layers with a normalization layer. the MLP block is used in a transformer model after the attention block. :: Input ↓ Reshape ↓ Pool ↓ Reshape ↓ norm :param thw_shape: the shape of the input tensor (before flattening). :type thw_shape: List :param pool: Pool operation that is applied to the input tensor. If pool is None, return the input tensor. :type pool: Callable :param has_cls_embed: whether the input tensor contains cls token. Pool operation excludes cls token. :type has_cls_embed: bool :param norm: Optional normalization operation applied to tensor after pool. :type norm: Callable. |
|
Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). |
|
Multiscale Vision Transformers Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer https://arxiv.org/abs/2104.11227 |
|
Multi-layer perception module. |
|
MultiScaleBlock for mvit |
|
PatchEmbed. |
|
BasicHead. |