towhee.models.clip.clip_utils.tokenize¶

towhee.models.clip.clip_utils.tokenize(texts: Union[str, List[str]], context_length: int = 77, truncate: bool = False) → Union[IntTensor, LongTensor][source]¶

Returns the tokenized representation of given input string(s) Parameters

Parameters:

texts – Union[str, List[str]] An input string or a list of input strings to tokenize
context_length – int The context length to use; all CLIP models use 77 as the context length
truncate – bool Whether to truncate the text in case its encoding is longer than the context length

Returns:

A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length]. We return LongTensor when torch version is <1.8.0, since older index_select requires indices to be long.