1

我有许多具有这种数据结构的输出文件:中位数(低,高),我想将所有数字分成它们自己的列,但由于括号和括号内的逗号分隔数字,我遇到了困难。

library(data.table)

# Data structure = median (low, high)
output <- c("9540000 (0,11140000)", 
            "8.81329 (0,8.81329)", 
            "27080000 (0,45290000)", 
            "23.4947 (0,63.2807)") 

desired_out <- data.table(median = c(9540000, 8.81329, 27080000, 23.4947),
                          low = c(0, 0, 0, 0),
                          high = c(11140000, 8.81329, 45290000, 63.2807))

任何帮助将不胜感激...

4

2 回答 2

1

解决方案使用data.table

创建原始数据:

output <- c("9540000 (0,11140000)", 
            "8.81329 (0,8.81329)", 
            "27080000 (0,45290000)", 
            "23.4947 (0,63.2807)") 

library(data.table)
df <- data.table(output)

使用 data.table 将字符串变量分为中位数、低位和高位tstrsplit(我们使用 basegsub去掉括号):

df[, c("median", "low", "high") := tstrsplit(gsub("[()]", "", output), "[ ,]")]

df 现在是:

                  output   median low     high
1:  9540000 (0,11140000)  9540000   0 11140000
2:   8.81329 (0,8.81329)  8.81329   0  8.81329
3: 27080000 (0,45290000) 27080000   0 45290000
4:   23.4947 (0,63.2807)  23.4947   0  63.2807
于 2020-06-04T13:32:03.100 回答
1
out <- tstrsplit(gsub("\\(|\\)", "", output), " |,")
setnames(setDT(out), c("median", "low", "high"))

out          

     median low     high
1:  9540000   0 11140000
2:  8.81329   0  8.81329
3: 27080000   0 45290000
4:  23.4947   0  63.2807
于 2020-06-04T13:51:39.537 回答