3

Python NLTK implementation of Beeferman's PK and WindowDIFF are getting complete different results from python segeval implementation of both.

Using the same parameters.

hyp: 0100100000
ref: 0101000000
k=2
PK's SegEval:0.2222222
PK's NLTK:0.111111111

hyp: 111111
ref: 100100
k=2
PK's SegEval:0.4
PK's NLTK:0.64

This could lead different research results for who use it.
Why I am getting different results with PK in these 2 Implementations? PK has to have just one result.

4

1 回答 1

2

可能是您调用 NLTK 函数的方式出了问题,或者您使用的是旧版本的 NLTK。

对于 NLTK,我得到的结果与您在 segeval 中显示的结果相同:

>>> from nltk.metrics.segmentation import pk
>>> hyp = '0100100000'
>>> ref = '0101000000'
>>> pk(hyp, ref, 2)
0.2222222222222222
>>> hyp = '111111'
>>> ref = '100100'
>>> pk(hyp, ref, 2)
0.4

我的 nltk 版本:

>>> nltk.__version__
'3.0.5'

做这个:

$ pip install -U nltk
于 2015-09-24T06:45:10.097 回答