你好,
我想把句子分成两部分,从关键字_1到关键字_2,从关键字_2到句子的结尾,最好使用正则表达式。
例如(我的理想输出 - 如下所示):
下面是我制作的一个数据集。
数据集
library(tibble)
keyword_1 <- c("coffee", "apple", "rainbow", "strawberry shortcake")
keyword_2 <- c("life", "new york", "seven colours", "sweet and yummy")
raw <-
tibble(
sentence = c(
"coffee is keyword_1_1 life is keyword_2_1",
"apple is keyword_1_2 new york is keyword_2_2",
"rainbow is keyword_1_3 seven colours is keyword_2_3",
"strawberry shortcake is keyword_1_4 sweet and yummy is keyword 2_4"
))
raw
#> # A tibble: 4 x 1
#> sentence
#> <chr>
#> 1 coffee is keyword_1_1 life is keyword_2_1
#> 2 apple is keyword_1_2 new york is keyword_2_2
#> 3 rainbow is keyword_1_3 seven colours is keyword_2_3
#> 4 strawberry shortcake is keyword_1_4 sweet and yummy is keyword 2_4
预期输出
library(tibble)
output = tibble(
output1 = c(
"coffee is keyword_1_1",
"apple is keyword_1_2",
"rainbow is keyword_1_3",
"strawberry shortcake is keyword_1_4"
),
output2 = c("life is keyword_2_1", "new york is keyword_2_2",
"seven colours is keyword_2_3", "sweet and yummy is keyword 2_4")
)
output
#> # A tibble: 4 x 2
#> output1 output2
#> <chr> <chr>
#> 1 coffee is keyword_1_1 life is keyword_2_1
#> 2 apple is keyword_1_2 new york is keyword_2_2
#> 3 rainbow is keyword_1_3 seven colours is keyword_2_3
#> 4 strawberry shortcake is keyword_1_4 sweet and yummy is keyword 2_4
由reprex 包(v0.3.0)于 2021-03-18 创建
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os macOS 10.16
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2021-03-18
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.2)
#> cli 2.3.1 2021-02-23 [1] CRAN (R 4.0.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
#> debugme 1.1.0 2017-10-22 [1] CRAN (R 4.0.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
#> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.2)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
#> knitr 1.31 2021-01-27 [1] CRAN (R 4.0.2)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.2)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
#> pillar 1.5.0 2021-02-22 [1] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
#> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.2)
#> ps 1.4.0 2020-10-07 [1] CRAN (R 4.0.2)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2)
#> rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.2)
#> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.2)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> testthat 3.0.0 2020-10-31 [1] CRAN (R 4.0.2)
#> tibble * 3.1.0 2021-02-25 [1] CRAN (R 4.0.2)
#> usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.2)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.2)
#> vctrs 0.3.4 2020-08-29 [1] CRAN (R 4.0.2)
#> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2)
#> xfun 0.19.3 2020-11-06 [1] Github (yihui/xfun@12e77f5)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library