python - 从 nltk 树中获取单词的深度

Question

我正在开发一个 nlp 项目，我想根据它在依赖树中的位置过滤掉单词。

为了绘制树，我使用了这篇文章中的代码：

def to_nltk_tree(node):

    if node.n_lefts + node.n_rights > 0:
        return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
    else:
        return node.orth_

对于一个例句：

“世界上一群人突然精神上联系在一起”

我得到了这棵树：

从这棵树中，我想得到一个包含单词及其在树中相应深度的元组列表：

[(linked,1),(are,2),(suddenly,2),(mentally,2),(group,2),(A,3),(of,3),(people,4)....]

对于这种情况，我对没有孩子的单词不感兴趣：[are,suddenly,mentally,A,the] 所以到目前为止我所能做的就是只得到有孩子的单词列表，为此，我正在使用以下代码：

def get_words(root,words):
    children = list(root.children)
    for child in children:
        if list(child.children):
            words.append(child)
            get_words(child,words)
    return list(set(words)

[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents]
s_root = list(doc.sents)[0].root
words = []
words.append(s_root)    
words = get_words(s_root,words)
words

[around, linked, world, of, people, group]

从这里我怎样才能得到所需的带有单词及其各自深度的元组？

score 1 · Accepted Answer

你确定那是Tree你代码中的nltk吗？nltk 的Tree类没有children属性。使用 nltk Tree，您可以使用“treepositions”来做您想做的事情，它是树下的路径。每条路径都是一个分支选择元组。“人”的树位置是(0, 2, 1, 0)，如您所见，节点的深度就是其树位置的长度。

首先我得到叶子的路径，所以我可以排除它们：

t = nltk.Tree.fromstring("""(linked (are suddenly mentally 
                                     (group A (of (people (around (world the)))))))""")
n_leaves = len(t.leaves())
leavepos = set(t.leaf_treeposition(n) for n in range(n_leaves))

现在很容易列出非终端节点及其深度：

>>> for pos in t.treepositions():
        if pos not in leavepos:
            print(t[pos].label(), len(pos))
linked 0
are 1
group 2
of 3
people 4
around 5
world 6

顺便说一下，nltk 树有自己的显示方法。试试print(t)or t.draw()，它会在弹出窗口中绘制树。

python - 从 nltk 树中获取单词的深度

1 回答 1

Related

Reference