1

我的数据如下所示:

> head(df)
  ID                                                 Comment
1  1                                            I ate dinner.
2  2                              We had a three-course meal.
3  3                             Brad came to dinner with us.
4  4                                     He loves fish tacos.
5  5  In the end, we all felt like we ate too much. Code 5.16
6  6   We all agreed; it was a magnificent evening.72 points.

我想创建两个新列,一个称为A,一个称为B。如果出现以下任何单词/短语,我希望 A 列等于 1:如果出现以下任何单词/短语,我dinner,evening,we ate 希望 B 列等于 1:.in the end,all,Brad,5.16

我该怎么做呢?请注意,我需要完全匹配。

4

2 回答 2

2

我们可以用greplbase R

df$A <- +(grepl("\\b(dinner|evening|we|ate)\\b", df$Comment))
df$B <- +(grepl("\\b(in the end|all|Brad|5\\.16)\\b", df$Comment))

-输出

df
  ID                                                 Comment A B
1  1                                           I ate dinner. 1 0
2  2                             We had a three-course meal. 0 0
3  3                            Brad came to dinner with us. 1 1
4  4                                    He loves fish tacos. 0 0
5  5 In the end, we all felt like we ate too much. Code 5.16 1 1
6  6  We all agreed; it was a magnificent evening.72 points. 1 1

paste注意:模式也可以创建

v1 <- c("dinner", "evening", "we", "ate")
v2 <- c("in the end", "all", "Brad", "5.16")
pat1 <- paste0("\\b(", paste(v1, collapse = "|"), ")\\b")
pat2 <- paste0("\\b(", paste(v2, collapse = "|"), ")\\b")
df$A <- +(grepl(pat1, df$Comment))
df$B <- +(grepl(pat2, df$Comment))

数据

df <- structure(list(ID = 1:6, Comment = c("I ate dinner.", "We had a three-course meal.", 
"Brad came to dinner with us.", "He loves fish tacos.", "In the end, we all felt like we ate too much. Code 5.16", 
"We all agreed; it was a magnificent evening.72 points.")),
 class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))
于 2021-07-01T17:29:13.710 回答
1

这是否有效:

library(dplyr)
library(stringr)

df %>% mutate(A = +str_detect(Comment,str_c(c('dinner','evening','we ate'), collapse = '|')),
              B = +str_detect(Comment,str_c(c('in the end','all','Brad','5.16'), collapse = '|')))
# A tibble: 6 x 4
     ID Comment                                                     A     B
  <dbl> <chr>                                                   <int> <int>
1     1 I ate dinner.                                               1     0
2     2 We had a three-course meal.                                 0     0
3     3 Brad came to dinner with us.                                1     1
4     4 He loves fish tacos.                                        0     0
5     5 In the end, we all felt like we ate too much. Code 5.16     1     1
6     6 We all agreed; it was a magnificent evening.72 points       1     1
于 2021-07-01T17:33:05.280 回答