towhee.models.clip.clip_utils.tokenize

towhee.models.clip.clip_utils.tokenize(texts: Union[str, List[str]], context_length: int = 77, truncate: bool = False) Union[IntTensor, LongTensor][source]

Returns the tokenized representation of given input string(s) Parameters

Parameters:
  • texts – Union[str, List[str]] An input string or a list of input strings to tokenize

  • context_length – int The context length to use; all CLIP models use 77 as the context length

  • truncate – bool Whether to truncate the text in case its encoding is longer than the context length

Returns:

A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length]. We return LongTensor when torch version is <1.8.0, since older index_select requires indices to be long.