r - 删除 R 中两个特定字母之前的所有前导字符串

Question

我正在寻找一种方法来删除两个特定字母“bd”和“ls”之前的所有前导字符串。

但是，我只找到了在空格或标点符号之前删除字符串的正则表达式方法。有什么方法可以在特定字母对之前删除前导字符串？

      date_on                          location
14 2021-02-22 bradford, west yorkshire, bd9 6dp
15 2021-02-22                     bradford, bd4
16 2021-02-22          bradford, west yorkshire
17 2021-02-22           west yorkshire, bd1 1nq
18 2021-02-22          bradford, west yorkshire
19 2021-02-22                          ls28 7he

输入：

structure(list(date_on = structure(c(18680, 18680, 18680, 18680, 
18680, 18680), class = "Date"), location = c("bradford, west yorkshire, bd9 6dp", 
"bradford, bd4", "bradford, west yorkshire", "west yorkshire, bd1 1nq", 
"bradford, west yorkshire", "ls28 7he")), row.names = 14:19, class = "data.frame")

预期结果：

      date_on location
14 2021-02-22  bd9 6dp
15 2021-02-22      bd4
16 2021-02-22         
17 2021-02-22  bd1 1nq
18 2021-02-22         
19 2021-02-22 ls28 7he

structure(list(date_on = structure(c(18680, 18680, 18680, 18680, 
18680, 18680), class = "Date"), location = c("bd9 6dp", 
"bd4", "", "bd1 1nq", "", "ls28 7he")), row.names = 14:19, class = "data.frame")

score 2 · Accepted Answer

我们可以在这里尝试使用sub基本 R 选项：

df$location <- sub("^.*?(\\b(?:bd|ls)\\d+.*|$)$", "\\1", df$location)
df

      date_on location
14 2021-02-22  bd9 6dp
15 2021-02-22      bd4
16 2021-02-22         
17 2021-02-22  bd1 1nq
18 2021-02-22         
19 2021-02-22 ls28 7he

以下是使用的正则表达式模式的解释：

^                     from the start of the location
    .*?               consume all content up to, but not including
    (                 start capture group
        \\b(?:bd|ls)  a postal code starting in 'bd' or 'ls'
        \\d+          followed by one or more digits
        .*            consume the remainder of the location
        |             OR
        $             consume the remainder of any location NOT
                      having at least one postal code
    )                 stop capture group
$                     end of the location

score 1 · Accepted Answer

另一个带有的基本 R 选项sub：

df$location <- sub('.*(?=bd|ls)|.*', '', df$location, perl = TRUE)
df

#      date_on location
#14 2021-02-22  bd9 6dp
#15 2021-02-22      bd4
#16 2021-02-22         
#17 2021-02-22  bd1 1nq
#18 2021-02-22         
#19 2021-02-22 ls28 7he

在字符串中出现之前删除所有内容'bd|ls'，如果没有出现则删除所有内容。

r - 删除 R 中两个特定字母之前的所有前导字符串

2 回答 2

Related

Reference