参考 链接
NLTK定义了许多对使用自然语言很有用的类。在这里,我们看看如何使用NLTK中的上下文无关语法。
import nltk
groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I'
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas'
V -> 'shot'
P -> 'in'
""")
groucho_grammar
type(groucho_grammar)
nltk.grammar.CFG
groucho_grammar.start()
S
groucho_grammar.productions()
[S -> NP VP,
PP -> P NP,
NP -> Det N,
NP -> Det N PP,
NP -> 'I',
VP -> V NP,
VP -> VP PP,
Det -> 'an',
Det -> 'my',
N -> 'elephant',
N -> 'pajamas',
V -> 'shot',
P -> 'in']
from nltk.grammar import *
groucho_grammar.productions(lhs=Nonterminal("NP"))
[NP -> Det N, NP -> Det N PP, NP -> ‘I’]
groucho_grammar.productions(rhs=Nonterminal("Det"))
[NP -> Det N, NP -> Det N PP]
pp = groucho_grammar.productions(rhs=Nonterminal("Det"))
pp[0]
NP -> Det N
pp[0].lhs()
NP
pp[0].rhs()
(Det, N)
NLTK附带了用于cfg的预先实现的解析器。用CFG解析一个句子会返回一个解析树列表。我们可以查看它们的字符串表示形式,也可以用图形化的方式绘制树
sent = ['I', 'shot', 'an', 'elephant', 'in', 'my', 'pajamas']
parser = nltk.ChartParser(groucho_grammar)
trees = list(parser.parse(sent))
print(trees[0])
(S
(NP I)
(VP
(VP (V shot) (NP (Det an) (N elephant)))
(PP (P in) (NP (Det my) (N pajamas)))))
type(trees[0])
nltk.tree.Tree
trees[0]
trees[1]
from nltk.parse.generate import generate
for sentence in generate(groucho_grammar, depth=5):
print(' '.join(sentence))
an elephant shot an elephant
an elephant shot an pajamas
an elephant shot my elephant
an elephant shot my pajamas
an elephant shot I
an elephant shot I in I
an pajamas shot an elephant
an pajamas shot an pajamas
an pajamas shot my elephant
an pajamas shot my pajamas
an pajamas shot I
an pajamas shot I in I
my elephant shot an elephant
my elephant shot an pajamas
my elephant shot my elephant
my elephant shot my pajamas
my elephant shot I
my elephant shot I in I
my pajamas shot an elephant
my pajamas shot an pajamas
my pajamas shot my elephant
my pajamas shot my pajamas
my pajamas shot I
my pajamas shot I in I
an elephant in I shot an elephant
…
I shot my elephant
I shot my pajamas
I shot I
I shot I in I