也许是这样的:
library(stringr)
body <- "Scene 6: Second Lord: Nay, good my lord, put him to't; let him have his way. First Lord: If your lordship find him not a hilding, hold me no more in your respect. Second Lord: On my life, my lord, a bubble. BERTRAM: Do you think I am so far deceived in him? Second Lord: Believe it, my lord, in mine own direct knowledge, without any malice, but to speak of him as my kinsman, he's a most notable coward, an infinite and endless liar, an hourly promise-breaker, the owner of no one good quality worthy your lordship's entertainment."
p <- str_extract_all(body, "[:.?] [A-z ]*:")
# and get rid of extra signs
p <- str_replace_all(p[[1]], "[?:.]", "")
# strip white spaces
p <- str_trim(p)
p
"Second Lord" "First Lord" "Second Lord" "BERTRAM" "Second Lord"
# unique players
unique(p)
[1] "Second Lord" "First Lord" "BERTRAM"
正则表达式的解释:(不完美)
str_extract_all(body, "[:.?] [A-z ]*:")
匹配以:
or.
或?
( [:.?]
) 开头,后跟空格。任何字符和空格都会匹配到下一个:
。
获取位置
您可以使用str_locate_all
相同的正则表达式:
str_locate_all(body, "[:.?] [A-z ]*:")