我有 5 个列名的向量,它们相似但不相同。
我正在尝试根据. vector2
_vector3
vector4
vector5
vector1
我在这里和这里得到了一些想法,导致下面的代码。但最后,我什至在比较前两个向量时遇到了困难。更不用说覆盖它们了。
library(dplyr)
library(fuzzyjoin)
vector1 <- c("something","nothing", "anything", "number4")
vector2 <- c("some thing","no thing","addition", "anything", "number4")
vector3 <- c("some thing wrong","nothing", "anything_")
vector4 <- c("something","nothingg", "anything", "number_4")
vector5 <- c("something","nothing", "anything happening", "number4")
我开始如下:
apply(adist(x = vector1, y = vector2), 1, which.min)
data.frame(string_to_match = vector1,
closest_match = vector2[apply(adist(x = vector1, y = vector2), 1, which.min)])
string_to_match closest_match
1 something some thing
2 nothing no thing
3 anything anything
4 number4 number4
无论如何要向该解决方案添加距离并根据距离覆盖矢量?
期望的结果:
string_to_match closest_match distance
1 something some thing 1
2 nothing no thing 1
3 anything anything 0
4 number4 number4 0
vector1 <- c("something","nothing", "anything", "number4")
vector2 <- c("something","nothing","addition", "anything", "number4")
vector3 <- c("something","nothing", "anything")
vector4 <- c("something","nothing", "anything", "number4")
vector5 <- c("something","nothing", "anything", "number4")
有没有人可以让我走上正轨?