我有一个包含 138 个表的列表(prop.table)。每个表中最多可以有 20 个变量(数字类别范围为 11-95 作为列名)。我需要将此列表转换为主数据框。前三个表如下所示:

        21         41         42         43         52         71         81         82 
0.02007456 0.58158876 0.22483510 0.09349011 0.05248064 0.01204474 0.00544881 0.01003728 

        21         41         42         43         52         71         90 
0.01175122 0.36973345 0.34107194 0.03066781 0.08655775 0.01633706 0.14388077 

         21          22          23          41          42 
0.043254082 0.008307075 0.016614151 0.930392438 0.001432254 

我需要将其转换为矩阵,使其看起来像这样,当分类变量不可用时使用 NAs 或 0:

x<-matrix (nrow=3, ncol=11 )
colnames(x) <-c('21', '22', '23', '41', '42', '43', '52', '71', '81', '82', '90' )


df <- data.frame(matrix(unlist(prop.table), nrow=138, byrow=T))



5 回答 5



x1 <- c(1, 5, 7)
names(x1) <- 1:3
x2 <- c(1, 2, 7)
names(x2) <- c(1,3,5)
l <- list(x1, x2)

m <- matrix(nrow=length(l), ncol=5)
colnames(m) <- 1:5
for (i in 1:length(l)) {
  m[i, names(l[[i]])] <- l[[i]]



于 2013-04-10T00:44:07.617 回答


# make an example `prop.table`:
tbl <- 1:10
names(tbl) <- letters[1:10]
tbl <- as.matrix(tbl)

# make sure some of the columns are missing
prop.table <- list(tbl[sample(10, size=8),], tbl[sample(10, size=7),], tbl[sample(10, size=9),])
# [[1]]
# d b g c h f e i 
# 4 2 7 3 8 6 5 9 
# [[2]]
#  h  g  d  a  j  f  c 
#  8  7  4  1 10  6  3 
# [[3]]
#  c  i  b  d  j  a  h  g  e 
# 3  9  2  4 10  1  8  7  5 

您可以使用rbind.fill来自 的函数plyr,它只是rbind用 来填充缺失的列NA。它可以将一个数据框列表rbind放在一起,所以我prop.table首先将每个元素转换为一个数据框(需要t确保每个元素都prop.table[[i]]被视为一行,而不是一列)

rbind.fill(lapply(prop.table, function (x) as.data.frame(t(x))))
#   d  b g c h  f  e  i  a  j
# 1 4  2 7 3 8  6  5  9 NA NA
# 2 4 NA 7 3 8  6 NA NA  1 10
# 3 4  2 7 3 8 NA  5  9  1 10

(注意 - 您可以使用 对输出数据框的列进行排序x[, order(colnames(x))]

于 2013-04-10T01:46:54.493 回答


variable1         1          1          1          1          1          1          1          1
variable2        21         41         42         43         52         71         81         82 
variable30.02007456 0.58158876 0.22483510 0.09349011 0.05248064 0.01204474 0.00544881 0.01003728 

variable1         2          2          2          2          2          2          2 
variable2        21         41         42         43         52         71         90 
variable30.01175122 0.36973345 0.34107194 0.03066781 0.08655775 0.01633706 0.14388077 

variable1          3           3           3           3           3
variable2         21          22          23          41          42 
variable30.043254082 0.008307075 0.016614151 0.930392438 0.001432254 


cast(MyDataFrame, variable1~variable2)
于 2013-04-10T00:49:26.163 回答

这不是最有效的,但使用plyrand reshape2,并假设您的列表prop.tables被调用foo


allData <- dcast(ldply(lapply(seq_along(foo), function(x) data.frame(foo[[x]], id = x))), 
                id ~ x, value.var = 'Freq')


ff <- c('21', '22', '23', '41', '42', '43', '52', '71', '81', '82', '90' )

t(sapply(foo, function(x,y) {x[ff]} ))
于 2013-04-10T00:50:34.130 回答


## [[1]]
## x
##         21         41         42         43         52         71         81         82 
## 0.02007456 0.58158876 0.22483510 0.09349011 0.05248064 0.01204474 0.00544881 0.01003728 
## [[2]]
## x
##         21         41         42         43         52         71         90 
## 0.01175122 0.36973345 0.34107194 0.03066781 0.08655775 0.01633706 0.14388077 
## [[3]]
## x
##          21          22          23          41          42 
## 0.043254082 0.008307075 0.016614151 0.930392438 0.001432254 
## [[4]]
## x
##         21         22         31         41         42         43         81 
## 0.10028653 0.03123209 0.00487106 0.66103152 0.03037249 0.01604585 0.15616046 
## [[5]]
## x
##           21           41           42           43           81 
## 0.0662080825 0.8291774147 0.0005732302 0.0865577529 0.0174835196 
## [[6]]
## x
##          21          22          31          41          42          43          81 
## 0.081948424 0.002292264 0.006303725 0.825501433 0.029226361 0.020630372 0.034097421 

# Get unique names of all columns in tables in the list
resCol <- unique(unlist(lapply(ptl, names)))

# Get dimensions of desired result
nresCol <- length(resCol)
nresRow <- length(ptl)

# Create 'Template' data.frame row
DF <- as.data.frame(matrix(rep(0, nresCol), nrow = 1, dimnames = list(1, resCol)))

# for every table in list, create copy of DF, fill it appropriately, then rbind result together using do.call

result <- do.call(rbind, lapply(ptl, function(x) {
    retDF <- DF
    retDF[, names(x)] <- x

# rename rows(optional)
rownames(result) <- 1:nrow(result)

##           21        41           42         43         52         71         81         82        90          22         23          31
## 1 0.02007456 0.5815888 0.2248351018 0.09349011 0.05248064 0.01204474 0.00544881 0.01003728 0.0000000 0.000000000 0.00000000 0.000000000
## 2 0.01175122 0.3697334 0.3410719404 0.03066781 0.08655775 0.01633706 0.00000000 0.00000000 0.1438808 0.000000000 0.00000000 0.000000000
## 3 0.04325408 0.9303924 0.0014322544 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.0000000 0.008307075 0.01661415 0.000000000
## 4 0.10028653 0.6610315 0.0303724928 0.01604585 0.00000000 0.00000000 0.15616046 0.00000000 0.0000000 0.031232092 0.00000000 0.004871060
## 5 0.06620808 0.8291774 0.0005732302 0.08655775 0.00000000 0.00000000 0.01748352 0.00000000 0.0000000 0.000000000 0.00000000 0.000000000
## 6 0.08194842 0.8255014 0.0292263610 0.02063037 0.00000000 0.00000000 0.03409742 0.00000000 0.0000000 0.002292264 0.00000000 0.006303725
于 2013-04-10T02:04:04.433 回答