我想取一个代表对话的 tibble 并将其转换为可以在文本编辑器中手动编辑的 .txt,然后返回到 tibble 进行处理。
我遇到的主要挑战是以某种方式分隔文本块,以便在编辑后可以将它们重新导入为类似的格式,同时保留“发言人”的名称。
速度很重要,因为文件量和每个文本段的长度都很大。
这是输入小标题:
tibble::tribble(
~word, ~speakerTag,
"been", 1L,
"going", 1L,
"on", 1L,
"and", 1L,
"what", 1L,
"your", 1L,
"goals", 1L,
"are.", 1L,
"Yeah,", 2L,
"so", 2L,
"so", 2L,
"John", 2L,
"has", 2L,
"15", 2L
)
这是 .txt 中所需的输出:
###Speaker 1###
been going on and what your goals are.
###Speaker 2###
Yeah, so so John has 15
这是手动更正错误后所需的回报:
~word, ~speakerTag,
"been", 1L,
"going", 1L,
"on", 1L,
"and", 1L,
"what", 1L,
"your", 1L,
"goals", 1L,
"in", 1L,
"r", 1L,
"Yeah,", 2L,
"so", 2L,
"so", 2L,
"John", 2L,
"hates", 2L,
"50", 2L
)