请帮助我完成我的小项目。
拥有大量文本元素。每个元素都应该被分成一个小的句子列表。每个小列表应作为一个元素“保存”到初始大列表的新列中,与原始文本元素相同的位置(“行”)。
分割标准是"/$"
, "und/KON"
, "oder/KON"
. 这应该保留在新的小列表元素的头部。
我尝试过使用正则表达式,例如"/$|und/KON|oder/KON"
转义"$"
, "|"
, "/"
. 我也尝试更改参数perl = TRUE
,fixed = TRUE
并且FALSE
。每次我尝试注意都会发生。似乎|
没有正确解释。你有什么建议来解决这个问题?
library(stringr) # don't know if it's required
# Input list to be splitted at each
# "/$", "und/KON", "oder/KON"
# but should keep the expression at the start of the next list element
#
# Would be nice but not necessary: The small-list to be named after the ID in the first column
> r <- list(ID=c(01, 02, 03),
elements=c("This should become my first small-list :/$. the first element ,/$, the second element ,/$, and the third element ./$.",
"This should become my second small-list :/$. Element eins und/KON Element zwei oder/KON Element drei ./$.",
"This should become my third small-list :/$. Element Alpha und/KON Element Beta oder/KON Element Gamma ./$.")
# Would look something like
r$small_lists <- sapply(r$elements ,function(x) as.list(strsplit(x,"/$|und/KON"|oder/KON", fixed=TRUE)))
> r$small_lists
$01
[1] "This should become my first small-list "
[2] ":/$. the first element "
[3] ",/$, the second element "
[4] ",/$, and the third element "
[5] "./$."
$02
[1] "This should become my second small-list "
[2] ":/$. Element eins "
[3] "und/KON Element zwei "
[4] "oder/KON Element drei"
[5] "./$."
$03
[1] "This should become my third small-list "
[2] ":/$. Element Alpha "
[3] "und/KON Element Beta "
[4] "oder/KON Element Gamma "
[5] "./$."
> class(r)
[1] "list"
> class(r$small_lists)
[1] "list"