各种Transformer模型对应的Tokenizer类型,
不同的Tokenizer原理可参考:Summary of the tokenizers
| Model | Type of Tokenizer |
|---|---|
| Bert | WordPiece |
| DPRContextEncoder | WordPiece |
| DPRQuestionEncoder | WordPiece |
| DPRReader | WordPiece |
| Funnel | WordPiece |
| Lxmert | WordPiece |
| Electra | WordPiece |
| ConvBert | WordPiece |
| LayoutLM | WordPiece |
| RetriBert | WordPiece |
| DistilBert | WordPiece |
| MobileBert | WordPiece |
| SqueezeBert | WordPiece |
| BertJapanese | WordPiece |
| Flaubert | Byte-Pair Encoding (BPE) |
| XLM | Byte-Pair Encoding (BPE) |
| Herbert | Byte-Pair Encoding (BPE) |
| GPT2 | Byte-level BPE |
| Deberta | Byte-level BPE |
| Roberta | Byte-level BPE |
| Bart | Byte-level BPE |
| LED | Byte-level BPE |
| Luke | Byte-level BPE |
| Blenderbot | Byte-level BPE |
| Longformer | Byte-level BPE |
| Albert | SentencePiece |
| Barthez | SentencePiece |
| Bartpho | SentencePiece |
| BertGeneration | SentencePiece |
| BigBird | SentencePiece |
| Camembert | SentencePiece |
| DebertaV2 | SentencePiece |
| LayoutXLM | SentencePiece |
| M2M100 | SentencePiece |
| Marian | SentencePiece |
| MBart50 | SentencePiece |
| MBart | SentencePiece |
| MLuke | SentencePiece |
| MT5 | SentencePiece |
| Pegasus | SentencePiece |
| Reformer | SentencePiece |
| RemBert | SentencePiece |
| Speech2Text | SentencePiece |
| T5 | SentencePiece |
| XLMProphetNet | SentencePiece |
| XLMRoberta | SentencePiece |
| XLNet | SentencePiece |