towhee.models.vit.vitΒΆ

Functions

compute_rollout_attention

create_model

Classes

VitModel

Vision Transformer Model :param img_size: image height or width (height=width) :type img_size: int :param patch_size: patch height or width (height=width) :type patch_size: int :param in_c: number of image channels :type in_c: int :param num_classes: number of classes :type num_classes: int :param embed_dim: number of features :type embed_dim: int :param depth: number of blocks :type depth: int :param num_heads: number of heads for Multi-Attention layer :type num_heads: int :param mlp_ratio: mlp ratio :type mlp_ratio: float :param qkv_bias: if add bias to qkv layer :type qkv_bias: bool :param qk_scale: number to scale qk :type qk_scale: float :param representation_size: size of representations :type representation_size: int :param drop_ratio: drop rate of a block :type drop_ratio: float :param attn_drop_ratio: drop rate of attention layer :type attn_drop_ratio: float :param drop_path_ratio: drop rate of drop_path layer :type drop_path_ratio: float :param embed_layer: patch embedding layer :param norm_layer: normalization layer :param act_layer: activation layer