我有一个凌乱的文件,我正试图将其解析为 R 中的数字数据。数据包含在一个不是 XML 的文件中,但遵循特定格式:
"{"metrics":{"skin_temp":{"min":81.5,"max":96.8,"sum":93480.6,
"summary":{"max_skin_temp_per_minute":null,"min_skin_temp_per_minute":null},
"values":[93.2,93.2,93.3,93.3]],"stdev":0.9,"avg":2.1},
"gsr":{"min":0.000149,"max":31.5,"sum":10300.0,
"summary":{"max_gsr_per_minute":null,"min_gsr_per_minute":null},
"values":[1.22,1.23,1.2,1.2],"stdev":9.630000000000001,"avg":10.1},
"steps":{"min":0,"max":104,"sum":4202,
"summary":{"max_steps_per_minute":null,"min_steps_per_minute":null},
"values":[0,0,0,0]],"stdev":13.8,"avg":4}}"
我感兴趣的只是"values"
标签后面的代码块(此信息包含在我从中提取数据的网站中,但如果我需要它们,我可以轻松地计算 R 中的汇总统计信息)。
我知道有一种更简单的方法,但到目前为止我的代码如下所示:
raw_data <- gsub('\\"', '', raw_data)
analysis_data <- c()
positioner <- 0
for (x in 1:3) {
# find where the data starts (and add 8 more for the 'values' text)
data_start <- regexpr("values:[", substring(raw_data, positioner),
fixed=TRUE)[[1]] + 8 + positioner
data_end <- regexpr("]", substring(raw_data, data_start),
fixed=TRUE)[[1]] + data_start - 2
data_col <- as.numeric(strsplit(substring(raw_data, data_start,
data_end), ", ")[[1]])
analysis_data <- cbind(analysis_data, data_col)
positioner <- positioner + data_end
}
有时这可行,但有时positioner
变量会被欺骗。有没有更简单的方法来提取这段代码?