我有两个数据框,我试图根据国家名称字段加入,我想要实现的是:当找到完美匹配时,我只想保留那一行,否则我想显示所有行/选项。
library(fuzzyjoin)
df1 <- data.frame(
country = c('Germany','Germany and Spain','Italy','Norway and Sweden','Austria','Spain'),
score = c(7,8,9,10,11,12)
)
df2 <- data.frame(
country_name = c('Germany and Spain','Germany','Germany.','Germania','Deutschland','Germany - ','Spun','Spain and Portugal','Italy','Italia','Greece and Italy',
'Australia','Austria...','Norway (Scandinavia)','Norway','Sweden'),
comments = c('xxx','rrr','ttt','hhhh','gggg','jjjj','uuuuu','ooooo','yyyyyyyyyy','bbbbb','llllll','wwwwwww','nnnnnnn','cc','mmmm','lllll')
)
j <- regex_left_join(df1,df2, by = c('country' = 'country_name'), ignore_case = T)
结果 (j) 显示 'Germany and Spain' 出现 3 次,第 1 次出现是完美匹配,我想只保留这一个并摆脱其他两个。“挪威和瑞典”没有完美的匹配,所以我想保留两个可能的选项/行(原样)。
我怎样才能做到这一点?