1

我是 R 新手,我一直在尝试旋转从 CSV 文件中读取的数据框。原始 CSV 包含 5,000 个项目编号,在我的示例中我使用了前五个。我使用数据透视的最终结果应该显示每个项目编号的次数与完成的付款和付款类型一样多。例如,原始表格如下所示:

ITEM NUMBER P1      P2      P3      P4      PType1  PType2      PType3  PType4
697884      270     255     170     0       CASH    CA      VI  
697885      100     1160    310     580     CASH    AX      VI          CA
697886      1515    1455    1765    970     CASH    AX      VI          CA
697887      0       0       0       0               
697888      1755    3610    1950    0       AX          VI      CA

通过使用 pivot 我想得到一个像这样的表:

ITEM NUMBER Payment    PaymentType  
697884           270         CASH
697884           255         CA
697884           170         VI

...(下一项)

我当前的数据框包含 9 个变量,其中项目编号为 NUM,付款金额为 int,付款类型为 Factor。谢谢!

structure(list(ITEM.NUMBER = 697884:697888, Payment1 = c(270L, 
100L, 1515L, 0L, 1755L), Payment2 = c(255L, 1160L, 1455L, 0L, 
3610L), Payment3 = c(170L, 310L, 1765L, 0L, 1950L), Payment4 = c(0L, 
580L, 970L, 0L, 0L), PaymentType1 = structure(c(3L, 3L, 3L, 1L, 
2L), .Label = c("", "AX", "CASH"), class = "factor"), PaymentType2 = structure(c(3L, 
2L, 2L, 1L, 4L), .Label = c("", "AX", "CA", "VI"), class = "factor"), 
    PaymentType3 = structure(c(3L, 3L, 3L, 1L, 2L), .Label = c("", 
    "CA", "VI"), class = "factor"), PaymentType4 = structure(c(1L, 
    2L, 2L, 1L, 1L), .Label = c("", "CA"), class = "factor")), .Names = c("ITEM.NUMBER", 
"Payment1", "Payment2", "Payment3", "Payment4", "PaymentType1", 
"PaymentType2", "PaymentType3", "PaymentType4"), row.names = c(NA, 
-5L), class = "data.frame")
4

1 回答 1

0

您可以reshape从基础 R 中使用。假设您的数据称为“mydf”:

reshape(mydf, direction = "long", idvar="ITEM.NUMBER", 
        varying=2:ncol(mydf), sep = "")
#          ITEM.NUMBER time Payment PaymentType
# 697884.1      697884    1     270        CASH
# 697885.1      697885    1     100        CASH
# 697886.1      697886    1    1515        CASH
# 697887.1      697887    1       0            
# 697888.1      697888    1    1755          AX
# 697884.2      697884    2     255          CA
# 697885.2      697885    2    1160          AX
# 697886.2      697886    2    1455          AX
# 697887.2      697887    2       0            
# 697888.2      697888    2    3610          VI
# 697884.3      697884    3     170          VI
# 697885.3      697885    3     310          VI
# 697886.3      697886    3    1765          VI
# 697887.3      697887    3       0            
# 697888.3      697888    3    1950          CA
# 697884.4      697884    4       0            
# 697885.4      697885    4     580          CA
# 697886.4      697886    4     970          CA
# 697887.4      697887    4       0            
# 697888.4      697888    4       0

如果您想按“ITEM.NUMBER”订购,您可以使用order

out <- reshape(mydf, direction = "long", idvar="ITEM.NUMBER", 
               varying=2:ncol(mydf), sep = "")
out[order(out$ITEM.NUMBER), ]

更新

为了完整起见,这reshape2是我提出的方法:

首先,melt数据(如评论中所示):

mydfL <- melt(mydf, id.vars="ITEM.NUMBER")
head(mydfL)
#   ITEM.NUMBER variable value
# 1      697884 Payment1   270
# 2      697885 Payment1   100
# 3      697886 Payment1  1515
# 4      697887 Payment1     0
# 5      697888 Payment1  1755
# 6      697884 Payment2   255

其次,拆分“变量”列。可能有更好的方法来做到这一点,但这就是我想到的。

mydfL <- cbind(mydfL, do.call(rbind, strsplit(
  as.character(mydfL$variable), split = "(?<=[a-zA-Z])(?=[0-9])", perl = T)))
head(mydfL)
#   ITEM.NUMBER variable value       1 2
# 1      697884 Payment1   270 Payment 1
# 2      697885 Payment1   100 Payment 1
# 3      697886 Payment1  1515 Payment 1
# 4      697887 Payment1     0 Payment 1
# 5      697888 Payment1  1755 Payment 1
# 6      697884 Payment2   255 Payment 2

第三,dcast用来获取你的输出。由于某些列被命名为“1”和“2”,因此您需要使用反引号 (`) 来引用它们并让 R 将它们识别为列名而不是值。

dcast(mydfL, ITEM.NUMBER + `2` ~ `1`, value.var="value")
#    ITEM.NUMBER 2 Payment PaymentType
# 1       697884 1     270        CASH
# 2       697884 2     255          CA
# 3       697884 3     170          VI
# 4       697884 4       0            
# 5       697885 1     100        CASH
# 6       697885 2    1160          AX
# 7       697885 3     310          VI
# 8       697885 4     580          CA
# 9       697886 1    1515        CASH
# 10      697886 2    1455          AX
# 11      697886 3    1765          VI
# 12      697886 4     970          CA
# 13      697887 1       0            
# 14      697887 2       0            
# 15      697887 3       0            
# 16      697887 4       0            
# 17      697888 1    1755          AX
# 18      697888 2    3610          VI
# 19      697888 3    1950          CA
# 20      697888 4       0            
于 2014-02-02T05:16:11.697 回答