0

我正在尝试将我的原始数据转换为 Cox 回归的启停格式。我的原始数据集是这样的:

df = data.frame(initial = c(25, 25, 20, 21, 21, 17), 
                total = c(4.25, 28, 0.5, 38, 14, 43), 
                age = c(30, 53, 20, 59, 35, 60), 
                ethanol = c(0.04, 0.306, 0.201, 0.222, 0.047, 0.085), 
                status = c(0, 0, 0, 0, 0, 1))

例如,对于第一次观察,原始数据格式如下:

    initial  total  age  ethanol  status
 1  25       4.25   30    0.04    0

预期的数据格式如下:

 id  start  stop     ethanol  status
 1   0.00   25.00    0.00     0
 1   25.00  29.25    0.04     0
 1   29.25  30       0        0

所以我写代码如下

edf = data.frame(id = integer(), 
                 start = numeric(), 
                 stop = numeric(), 
                 ethanol = numeric(),
                 status = integer())

j = 1

for( i in 1:4){

  if( (df[i, 1] + df[i,2]) >= df[i,3] ){
    edf[j,1] = i
    edf[j,2] = 0
    edf[j,3] = df[i,"initial"]
    edf[j,4] = 0
    edf[j,5] = 0
    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"]
    edf[j,3] = df[i,"initial"] + df[i,"total"]
    edf[j,4] = df[i,"ethanol"]
    edf[j,5] = df[i,"status"]
  } else{
    edf[j,1] = i
    edf[j,2] = 0
    edf[j,3] = df[i,"initial"]
    edf[j,4] = 0
    edf[j,5] = 0

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"]
    edf[j,3] = df[i,"initial"] + df[i,"total"]
    edf[j,4] = df[i,"ethanol"]
    edf[j,5] = 0

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"] + df[i,"total"]
    edf[j,3] = df[i,"age"]
    edf[j,4] = 0
    edf[j,5] = df[i,"status"]
  }
}

但是我得到的数据框是(例如,第一个观察):

 id     start    stop    ethanol  status
 1      0.00     25.00   0.00     0
 1      25.00    29.25   0.04     0

缺少一行:

id  start    stop    ethanol  status
1   29.25    30      0        0

似乎 else 语句的最后一部分尚未执行:

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"] + df[i,"total"]
    edf[j,3] = df[i,"age"]
    edf[j,4] = 0
    edf[j,5] = df[i,"status"]

不知道怎么回事,有什么建议吗?我在 MacOS (x86_64-apple-darwin15.6.0.) 上使用 R 版本 3.4.4。提前致谢!

4

1 回答 1

0

在循环的每次迭代中写入第一行之前,您不会增加j行号。因此,您每次都在写上一行。以下将起作用。

j = 0

for( i in 1:4){
  j = j + 1
  if( (df[i, 1] + df[i,2]) >= df[i,3] ){
    edf[j,1] = i
    edf[j,2] = 0
    edf[j,3] = df[i,"initial"]
    edf[j,4] = 0
    edf[j,5] = 0
    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"]
    edf[j,3] = df[i,"initial"] + df[i,"total"]
    edf[j,4] = df[i,"ethanol"]
    edf[j,5] = df[i,"status"]
  } else{
    edf[j,1] = i
    edf[j,2] = 0
    edf[j,3] = df[i,"initial"]
    edf[j,4] = 0
    edf[j,5] = 0

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"]
    edf[j,3] = df[i,"initial"] + df[i,"total"]
    edf[j,4] = df[i,"ethanol"]
    edf[j,5] = 0

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"] + df[i,"total"]
    edf[j,3] = df[i,"age"]
    edf[j,4] = 0
    edf[j,5] = df[i,"status"]
  }
}

编辑:有更好的方法来做到这一点。可能有一些包可以更轻松地重塑数据。或者您可以将三个开始/停止步骤创建为单独的数据框,然后将它们合并。如果做不到这一点,您至少可以像这样简化:

df$end = df$initial + df$total
for (i in rownames(df)) {
    r = df[i,]
    edf[nrow(edf) + 1,] = list(i, 0, r$initial, 0, 0)
    if (r$end >= r$age){
      edf[nrow(edf) + 1,] = list(i, r$initial, r$end, r$ethanol, r$status)
    }
    else {
      edf[nrow(edf) + 1,] = list(i, r$initial, r$end, r$ethanol, 0)
      edf[nrow(edf) + 1,] = list(i, r$end, r$age, 0, r$status)
    }
}
于 2018-09-23T17:34:18.870 回答