spacy - 如何保留依赖项的顺序？

Question

我有以下代码打开目录中的文件，在它们上运行 spaCy NLP，并将输出依赖项解析信息放入新目录中的文件中。

import spacy, os

nlp = spacy.load('en')

path1 = 'C:/Path/to/my/input'
path2 = '../output'
for file in os.listdir(path1):
    with open(file, encoding='utf-8') as text:
        txt = text.read()
        doc = nlp(txt)
        for sent in doc.sents:
            f = open(path2 + '/' + file, 'a+')
            for token in sent:
                f.write(file + '\t' + str(token.dep_) + '\t' + str(token.head) + '\t' + str(token.right_edge) + '\n')
    f.close()

问题是这不会保留输出文件中依赖项的顺序。我似乎在 API 文档中找不到任何对字符位置的引用。

score 1 · Accepted Answer

字符索引位于token.idx。词索引在token.i。我知道这不是特别直观。

令牌还按位置进行比较，因此您可以这样做：

for child in sent:
    word1, word2 = sorted((child, child.head))

这将为您提供按文档顺序排列的每个依赖关系弧。不过，我不确定你想用右边的边缘做什么，所以我不确定这是否完全符合你的要求。

spacy - 如何保留依赖项的顺序？

1 回答 1

Related

Reference