各种Transformer模型对应的Tokenizer类型,
不同的Tokenizer原理可参考:Summary of the tokenizers
Model | Type of Tokenizer |
---|---|
Bert | WordPiece |
DPRContextEncoder | WordPiece |
DPRQuestionEncoder | WordPiece |
DPRReader | WordPiece |
Funnel | WordPiece |
Lxmert | WordPiece |
Electra | WordPiece |
ConvBert | WordPiece |
LayoutLM | WordPiece |
RetriBert | WordPiece |
DistilBert | WordPiece |
MobileBert | WordPiece |
SqueezeBert | WordPiece |
BertJapanese | WordPiece |
Flaubert | Byte-Pair Encoding (BPE) |
XLM | Byte-Pair Encoding (BPE) |
Herbert | Byte-Pair Encoding (BPE) |
GPT2 | Byte-level BPE |
Deberta | Byte-level BPE |
Roberta | Byte-level BPE |
Bart | Byte-level BPE |
LED | Byte-level BPE |
Luke | Byte-level BPE |
Blenderbot | Byte-level BPE |
Longformer | Byte-level BPE |
Albert | SentencePiece |
Barthez | SentencePiece |
Bartpho | SentencePiece |
BertGeneration | SentencePiece |
BigBird | SentencePiece |
Camembert | SentencePiece |
DebertaV2 | SentencePiece |
LayoutXLM | SentencePiece |
M2M100 | SentencePiece |
Marian | SentencePiece |
MBart50 | SentencePiece |
MBart | SentencePiece |
MLuke | SentencePiece |
MT5 | SentencePiece |
Pegasus | SentencePiece |
Reformer | SentencePiece |
RemBert | SentencePiece |
Speech2Text | SentencePiece |
T5 | SentencePiece |
XLMProphetNet | SentencePiece |
XLMRoberta | SentencePiece |
XLNet | SentencePiece |