towhee.models.clip.clip_utils.tokenize¶
- towhee.models.clip.clip_utils.tokenize(texts: Union[str, List[str]], context_length: int = 77, truncate: bool = False) Union[IntTensor, LongTensor] [source]¶
Returns the tokenized representation of given input string(s) Parameters
- Parameters:
texts – Union[str, List[str]] An input string or a list of input strings to tokenize
context_length – int The context length to use; all CLIP models use 77 as the context length
truncate – bool Whether to truncate the text in case its encoding is longer than the context length
- Returns:
A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length]. We return LongTensor when torch version is <1.8.0, since older index_select requires indices to be long.