python - 检查字符串中的单词是否相似？（例如书和布鲁克 vs 书和运气）

Question

我编写了比较两个字符串以查找匹配单词的代码。现在我希望能够找到相对接近的单词。例如，书和布鲁克是相似的，而书和运气则不是。我该怎么办？

我正在考虑将每个单词分成字符然后计算所述字符的频率？现在，一个匹配的单词给出的值是 0。否则，给出 2，但我想扩展该部分以执行我上面描述的操作。

for i in range(0, out.shape[0]):  # from 0 to total number of rows out.shape[0] is rows - out.shape[1] is columns
    for word in refArray:  # for each word in the samplearray

        #out.ix[i, str(word)] = out.index[i].count(str(word))
        if out.index[i].count(str(word)) == 1:
            out.ix[i, str(word)] = 0 
        else:
            out.ix[i, str(word)] = 2

score 0 · Accepted Answer

您要计算编辑距离。 https://en.wikipedia.org/wiki/Edit_distance

$ pip3 search edit | grep distance
edith (0.1.0a1)            - Edit-distanc implementation with edit-path retrieval
string-distance (1.0.0)    - Minimum Edit Distance
subdist (0.2.1)            - Substring edit distance
editdist (0.1)             - Calculate Levenshtein's edit distance
leven (1.0.4)              - Levenshtein edit distance library

score -1 · Accepted Answer

浏览 Google 后，我最终使用了 nltk。在这个阶段我只需要比较简单的单词就可以了解我的程序的基本功能。稍后将考虑更复杂的解决方案。感谢帮助。

import nltk
nltk.edit_distance("word1", "word2")

来源： https ://datascience.stackexchange.com/a/12583/56244

python - 检查字符串中的单词是否相似？（例如书和布鲁克 vs 书和运气）

2 回答 2

Related

Reference