0

这是我想做的事情:当我分析的术语是“苹果”时,我想知道“苹果”需要多少转置才能在字符串中找到。

“立即购买苹果” => 需要 0 次换位(有苹果)。

“网上便宜的苹果” => 需要 1 次换位(苹果到苹果)。

“在这里找到你的苹果” => 需要 2 个换位(苹果到苹果)。

"aple" => 需要 2 次换位(从苹果到苹果)。

"bananas" => 需要 5 次换位(苹果到香蕉)。

stringdist 和 adist 函数不起作用,因为它们告诉我需要多少转置才能将一个字符串转换为另一个字符串。无论如何,这是我到目前为止写的:

#build matrix
a <- c(rep("apples",5),rep("bananas",3))
b <- c("buy apples now","cheap aples online","find your ap ple here","aple","bananas","cherry and bananas","pumpkin","banana split")
d<- data.frame(a,b)
colnames(d)<-c("term","string")

#count transpositions needed
d$transpositions <- mapply(adist,d$term,d$string)
print(d)
4

2 回答 2

0

所以,这是我到目前为止提出的肮脏解决方案:

#create a data.frame
a <- c(rep("apples",5),rep("banana split",3))
b <- c("buy apples now","cheap aples online","find your ap ple here","aple","bananas","cherry and bananas","pumpkin","banana split")
d <- data.frame(a,b)
colnames(d) <- c("term","string")

#split the string into sequences of consecutive characters whose length is equal to the length of the term on the same row. Calculate the similarity to the term of each sequence of characters and identify the most relevant piece of string for each row.

mostrelevantpiece <- NULL

for (j in 1:length(d$string)){
  pieces<-NULL
  piecesdist<-NULL
  for (i in 1:max((nchar(as.character(d$string[j]))-nchar(as.character(d$term[j])))+1,1)){
    addpiece <- substr(d$string[j],i,i+nchar(as.character(d$term[j]))-1)
    dist <- adist(addpiece,d$term[j])
    pieces[i] <- str_trim(addpiece)
    piecesdist[i] <- dist
    mostrelevantpiece[j] <- pieces[which.min(piecesdist)]
  }
}

#calculate the number of transpositions needed to transform the "most relevant piece of string" into the term.

d$transpositionsneeded <- mapply(adist,mostrelevantpiece,d$term)
于 2015-04-04T03:00:27.907 回答
0

您需要先检查苹果,然后再进行换位

a <- c(rep("apples",5),rep("bananas",3))
b <- c("buy apples now","cheap aples online","find your ap ple here","aple","bananas","cherry and bananas","pumpkin","banana split")
d<- data.frame(a,b, stringsAsFactors = F)
colnames(d)<-c("term","string")

#check for apples first
d$apples <-grepl("apples", d$string)

#count transpositions needed
d$transpositions <- ifelse(d$apples ==FALSE, mapply(adist,d$term,d$string), 0)
print(d)
于 2015-04-03T18:12:27.457 回答