towhee.models.clip.simple_tokenizerΒΆ

Functions

basic_clean

bytes_to_unicode

Returns list of utf-8 byte and a corresponding list of unicode strings.

default_bpe

get_pairs

Return set of symbol pairs in a word.

whitespace_clean

Classes

SimpleTokenizer

Simple Tokenizer for text input