维基百科的定义: an n-gram (sometimes also called Q-gram) is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application.
中文:在计算语言学中,n-gram指的是文本中连续的n个item(item可以是phoneme, syllable, letter, word或base pairs)
n-gram 中如果n=1则为unigram,n=2则为bigram,n=3则为trigram。n>4后,则直接用数字指称,如4-gram,5gram。
示例
以 I will go to United States. 这句话为例。bigram为:
I will
will go
go to
to United
United States
This is a sentence
