我有一个看起来像这样的字符串:
t2 <- "============================================
Model 1 Model 2
--------------------------------------------
education 3.66 *** 2.80 ***
(0.65) (0.59)
income 1.04 *** 0.85 ***
(0.26) (0.23)
type: blue collar -5.91 -27.55 ***
(3.94) (5.41)
type: white collar -8.82 ** -24.12 ***
(2.79) (5.35)
income x blue collar 3.01 ***
(0.58)
income x white collar 1.91 *
(0.81)
prop. female 0.01 0.08 *
(0.03) (0.03)
--------------------------------------------
R^2 0.83 0.87
Adj. R^2 0.83 0.86
Num. obs. 98 98
============================================
*** p < 0.001, ** p < 0.01, * p < 0.05"
我正在尝试提取左侧列,以便得到一个看起来像这样的向量:
education
income
type: blue collar
type: white collar
income x blue collar
income x white collar
prop. female
我是 and 的新手regex
,stringr
我正在尝试提取换行符之后的单词:
library(stringr)
covariates <- str_extract_all(t2, "\n\\w+")
covariates
这让我更接近了:
[1] "\neducation" "\nincome" "\ntype" "\ntype" "\nincome" "\nincome" "\nprop" "\nR"
[9] "\nAdj" "\nNum"
但我不知道如何捕获整个文本列,例如,获取完整的“类型:蓝领”,而不是“\ ntype”。