0

我有最初来自 txt 文件的 data.frame。它以相当不方便的形式出现,即按年份在列上分配观察值。我在分析中需要作为回归变量的实际变量被分配到一列作为因子。所以我需要对这个data.frame进行以下转换:

         VAR    YEAR.1    YEAR.2    YEAR.3
FIRM.1   VAR.1  FV_11.1   FV_11.2   FV_11.3 
FIRM.1   VAR.2  FV_12.1   FV_12.2   FV_12.3
FIRM.2   VAR.1  FV_21.1   FV_21.2   FV_21.3
FIRM.2   VAR.2  FV_22.1   FV_22.2   FV_22.3

其中 FV_ij.k 是公司 i 变量 j 在第 k 年的观测值。理想情况下,生成的 data.frame 应该是这样的:

         YEAR    VAR.1    VAR.2
 FIRM.1  YEAR.1  FV_11.1  FV_12.1
 FIRM.1  YEAR.2  FV_11.2  FV_12.2
 FIRM.1  YEAR.3  FV_11.3  FV_12.3
 FIRM.2  YEAR.1  FV_21.1  FV_22.1
 FIRM.2  YEAR.2  FV_21.2  FV_22.2
 FIRM.2  YEAR.3  FV_21.3  FV_22.3     

我有一个想法如何编码,但它很麻烦。我想知道是否有一些包可以方便地进行这种转换?

4

1 回答 1

2

我会建议meltdcast从“reshape2”包中。但首先,这里有一些示例数据:

mydf <- structure(list(FIRM = c("FIRM.1", "FIRM.1", "FIRM.2", "FIRM.2"),
    VAR = c("VAR.1", "VAR.2", "VAR.1", "VAR.2"), YEAR.1 = c("FV_11.1",
    "FV_12.1", "FV_21.1", "FV_22.1"), YEAR.2 = c("FV_11.2", "FV_12.2",
    "FV_21.2", "FV_22.2"), YEAR.3 = c("FV_11.3", "FV_12.3", "FV_21.3",
    "FV_22.3")), .Names = c("FIRM", "VAR", "YEAR.1", "YEAR.2", "YEAR.3"),
    class = "data.frame", row.names = c(NA, -4L))
mydf
#     FIRM   VAR  YEAR.1  YEAR.2  YEAR.3
# 1 FIRM.1 VAR.1 FV_11.1 FV_11.2 FV_11.3
# 2 FIRM.1 VAR.2 FV_12.1 FV_12.2 FV_12.3
# 3 FIRM.2 VAR.1 FV_21.1 FV_21.2 FV_21.3
# 4 FIRM.2 VAR.2 FV_22.1 FV_22.2 FV_22.3

第 1 步:将数据转换为长格式。不过,在此之前,请去掉“VAR”。来自“VAR”列

library(reshape2)
mydf$VAR <- gsub("VAR.", "", mydf$VAR)
out <- melt(mydf, id.vars=c("FIRM", "VAR"))

第 2 步:用于dcast将数据转换为您想要的形式

dcast(out, FIRM + variable ~ VAR)
#     FIRM variable       1       2
# 1 FIRM.1   YEAR.1 FV_11.1 FV_12.1
# 2 FIRM.1   YEAR.2 FV_11.2 FV_12.2
# 3 FIRM.1   YEAR.3 FV_11.3 FV_12.3
# 4 FIRM.2   YEAR.1 FV_21.1 FV_22.1
# 5 FIRM.2   YEAR.2 FV_21.2 FV_22.2
# 6 FIRM.2   YEAR.3 FV_21.3 FV_22.3
于 2013-10-01T09:27:12.090 回答