我正在尝试从以下内容重新排列 R 中的数据:
Patient ID,Episode Number,Admission Date (A),Admission Date (H),Admission Time (A),Admission Time (H)
1,5,20/08/2011,21/08/2011,1200,1300
2,6,21/08/2011,22/08/2011,1300,1400
3,7,22/08/2011,23/08/2011,1400,1500
4,8,23/08/2011,24/08/2011,1500,1600
类似于:
Record Type,Patient ID,Episode Number,Admission Date,Admission Time
H,1,5,20/08/2011,1200
A,1,5,21/08/2011,1300
H,2,6,21/08/2011,1300
A,2,6,22/08/2011,1400
H,3,7,22/08/2011,1400
A,3,7,23/08/2011,1500
H,4,8,23/08/2011,1500
A,4,8,24/08/2011,1600
(我使用了 CSV 格式,因此更容易将它们用作测试数据)。
我尝试了 reshape() 函数,它有点工作:
> reshape(foo, direction = "long", idvar = 1, varying = 3:dim(foo)[2],
> sep = "..", timevar = "dataset")
Patient.ID Episode.Number dataset Admission.Date Admission.Time
1.A. 1 5 A. 20/08/2011 1200
2.A. 2 6 A. 21/08/2011 1300
3.A. 3 7 A. 22/08/2011 1400
4.A. 4 8 A. 23/08/2011 1500
1.H. 1 5 H. 21/08/2011 1300
2.H. 2 6 H. 22/08/2011 1400
3.H. 3 7 H. 23/08/2011 1500
4.H. 4 8 H. 24/08/2011 1600
但它不是我想要的确切格式(我想要每个“患者 ID”,第一行是“H”,第二行是“A”)。
此外,当我将其扩展到读取数据(有 250 多列)时,它失败了:
> reshape(realdata, direction = "long", idvar = 1, varying =
> 6:dim(foo)[2], sep = "..", timevar = "dataset")
Error in reshapeLong(data, idvar = idvar, timevar = timevar, varying = varying, :
'varying' arguments must be the same length
我认为部分是因为 colnames 看起来像:
> colnames(foo)
[1] "Unique.Key"
[2] "Campus.Code"
[3] "UR"
[4] "Terminal.digit"
[5] "Admission.date..A."
[6] "Admission.date..H."
[7] "Admission.time..A."
[8] "Admission.time..H."
.
.
.
[31] "Medicare.Number"
[32] "Payor"
[33] "Doctor.specialty"
[34] "Clinic"
.
.
.
[202] "Admission.Source..A."
[203] "Admission.Source..H."
即有后缀的列之间有“公共列”(没有后缀)(希望这是有道理的)。