towhee.models.multiscale_vision_transformers.mvit

Functions

align_scale

compute_rollout_attention

drop_path_func

Stochastic Depth per sample.

find_most_h_w

resize_last_dim_to_most

round_width

Classes

AttentionPool

A MLP block that contains two linear layers with a normalization layer. the MLP block is used in a transformer model after the attention block. :: Input ↓ Reshape ↓ Pool ↓ Reshape ↓ norm :param thw_shape: the shape of the input tensor (before flattening). :type thw_shape: List[int] :param pool: Pool operation that is applied to the input tensor. If pool is None, return the input tensor. :type pool: Callable :param has_cls_embed: whether the input tensor contains cls token. Pool operation excludes cls token. :type has_cls_embed: bool :param norm: Optional normalization operation applied to tensor after pool. :type norm: Callable.

DropPath

Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

MViT

Multiscale Vision Transformers Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer https://arxiv.org/abs/2104.11227

Mlp

Multi-layer perception module.

MultiScaleAttention

MultiScaleBlock

MultiScaleBlock for mvit

PatchEmbed

PatchEmbed.

Permute

TransformerBasicHead

BasicHead.