我编写了比较两个字符串以查找匹配单词的代码。现在我希望能够找到相对接近的单词。例如,书和布鲁克是相似的,而书和运气则不是。我该怎么办?
我正在考虑将每个单词分成字符然后计算所述字符的频率?现在,一个匹配的单词给出的值是 0。否则,给出 2,但我想扩展该部分以执行我上面描述的操作。
for i in range(0, out.shape[0]): # from 0 to total number of rows out.shape[0] is rows - out.shape[1] is columns
for word in refArray: # for each word in the samplearray
#out.ix[i, str(word)] = out.index[i].count(str(word))
if out.index[i].count(str(word)) == 1:
out.ix[i, str(word)] = 0
else:
out.ix[i, str(word)] = 2