Python NLTK implementation of Beeferman's PK and WindowDIFF are getting complete different results from python segeval implementation of both.
Using the same parameters.
hyp: 0100100000
ref: 0101000000
k=2
PK's SegEval:0.2222222
PK's NLTK:0.111111111
hyp: 111111
ref: 100100
k=2
PK's SegEval:0.4
PK's NLTK:0.64
This could lead different research results for who use it.
Why I am getting different results with PK in these 2 Implementations? PK has to have just one result.