我fuzzyjoin
用来跨越政客和他们各自的地区:
library(dplyr)
library(fuzzyjoin)
x <- tibble(name = c("Fulvio Rossi Ciocca", "Rigoberto Del Carmen Rojas Sarapura", "Lorena Vergara Bravo", "Lily Perez San Martin"),
activity = c("surgeon", "business", "public administration", "publicist"))
y <- tibble(name = c("Rossi Ciocca Fulvio", "Perez San Martin Lily"), region = c(1,5))
z <- x %>%
stringdist_inner_join(y, max_dist = 10)
在我的例子中,“Fulvio Rossi Ciocca”和“Rossi Ciocca Fulvio”是同一个人。事实上,我的数据集中的所有数据都包含相同的人,但有一些变化,比如“Lennon John”而不是“John Lennon”。
我确实查看了fuzzyjoin
文档,但找不到编写此伪代码的工作版本的方法:
x %>%
fuzzy_join(y, mode = "left", match_fun = "A ~ permutations(A)")