我使用 r 中的 pdftools 从 pdf 中提取了表格。PDF 中的表格具有用于列的多行文本。我用“|”替换了超过2个空格的空格 这样就更容易了。但我遇到的问题是,由于多行和表格在 PDF 中的格式化方式,数据出现乱序。原来的样子是这样的
我提取的数据如下所示:
scale_definitions <- c("", " to lack passion easily annoyed",
" Excitable", " to lack a sense of urgency emotionally volatile",
"", " naive mistrustful",
" Skeptical", " gullible cynical",
"", " overly confident too conservative",
" Cautious", " to make risky decisions risk averse",
"", " to avoid conflict aloof and remote",
" Reserved", " too sensitive indifferent to others' feelings",
"", " unengaged uncooperative",
" Leisurely", " self-absorbed stubborn",
"", " unduly modest arrogant",
" Bold", " self-doubting entitled and self-promoting",
"", " over controlled charming and fun",
" Mischievous", " inflexible careless about commitments",
"", " repressed dramatic",
" Colorful", " apathetic noisy",
"", " too tactical impractical",
" Imaginative", " to lack vision eccentric",
"", " careless about details perfectionistic",
" Diligent", " easily distracted micromanaging",
"", " possibly insubordinate respectful and deferential",
" Dutiful", " too independent eager to please"
)
scale_definitions <- scale_definitions %>% str_replace_all("\\s{2,}", "|")
我如何最好地将其放入数据框中?