1

I have a vector with multiple strings

strings <- c("CD4","CD8A")

and I'd like to output an OR statement to be passed to grep like so

"CD4-|-CD4-|-CD4$|CD8A-|-CD8A-|-CD8A$"

and so on for each element in the vector..

basically I'm trying to find an exact word in a string that has three dashes in it, (I don't want grep(CD4, ..) to return strings with CD40). This is how I thought of doing it but I'm open to other suggestions

part of my data.frame looks like this:

Genes <- as.data.frame(c("CD4-MyD88-IL27RA", "IL2RG-CD4-GHR","MyD88-CD8B-EPOR", "CD8A-IL3RA-CSF3R", "ICOS-CD40-LMP1"))
colnames(Genes) <- "Genes"
4

2 回答 2

3

这是一个单行...

Genes$Genes[grep(paste0("\\b",strings,"\\b",collapse="|"),Genes$Genes)]

[1] "CD4-MyD88-IL27RA" "IL2RG-CD4-GHR"    "CD8A-IL3RA-CSF3R"

它使用单词边界标记\\b来确保它匹配完整的子字符串(因为-不算作单词的一部分)。

于 2018-05-14T17:08:06.530 回答
1

我不知道我是否理解。如果我明白了,下面的命令将返回你想要的

stringr::str_split(Genes$Genes, pattern = '-') %>% 
  purrr::map(
    function(data) {
      data[stringr::str_which(data, pattern = '^CD')]
    }
  )  %>% unlist
于 2018-05-14T17:09:37.697 回答