从 agstudy 的答案中摘取叶子,这是一个不使用魔术分隔符但不保留文本中点索引的解决方案:
// Matches:
// 1. Single letter prefixes: a), b) ... z)
// 2. Roman numerals (only small case): [i,x,c,m,v]+
// 3. Numeral indexes: [0-9]*
delim <- "((^|\\s)\\(?([a-z]|[i,x,c,m,v]+|[0-9]+)\\))"
ll <- by(dat, dat$PrjID, function (r) {
each.obj <- str_split(r$Objective, delim)[[1]][-1]
data.frame(PrjId = r$PrjID, Objective = str_trim(each.obj))
})
do.call(rbind, ll)
PrjId Objective
1001.1 1001 First(could be something)
1001.2 1001 Seconds (blah something else)
1001.3 1001 (how can thins be) Third
1002.1 1002 To improve efficiency
1002.2 1002 Decrease cost
1002.3 1002 Maximize revenue
1003.1 1003 Getting tricky
1003.2 1003 Challanging task
dat在这种情况下是:
> dat
PrjID
1 1001
2 1002
3 1003
Objective
1 (i) First(could be something) b) Seconds (blah something else) (3) (how can thins be) Third
2 (i) To improve efficiency (ii) Decrease cost (iii) Maximize revenue
3 (1) Getting tricky (2) Challanging task