r - 在R中来回虚拟变量

Question

所以，两年来我一直在断断续续地使用 R，并试图了解矢量化的整个概念。由于我经常处理来自调查的多个响应集中的虚拟变量，我认为学习这个案例会很有趣。

这个想法是从多个响应到虚拟变量（和返回），例如：“在这 8 种不同的巧克力中，您最喜欢哪一种（最多选择 3 种）？”

有时我们将其编码为虚拟变量（1表示人喜欢“Cote d'Or”，0表示人不喜欢它），每个选项有 1 个变量，有时是分类变量（1表示人喜欢“Cote d'Or” " , 2代表喜欢 "Lindt" 的人，依此类推），3 个变量代表 3 个选项。

所以，基本上我可以得到一个矩阵，这些行就像

1,0,0,1,0,0,1,0

或带有线条的矩阵

1,4,7

如前所述，这个想法是从一个到另一个。到目前为止，我得到了每个案例的循环解决方案和从虚拟到分类的矢量化解决方案。我将不胜感激对此事的任何进一步见解以及分类到虚拟步骤的矢量化解决方案。

虚拟到非虚拟

vecOrig<-matrix(0,nrow=18,ncol=8)  # From this one
vecDest<-matrix(0,nrow=18,ncol=3)  # To this one

# Populating the original matrix.
# I'm pretty sure this could have been added to the definition of the matrix, 
# but I kept getting repeated numbers.
# How would you vectorize this?
for (i in 1:length(vecOrig[,1])){               
vecOrig[i,]<-sample(vec)
}

# Now, how would you vectorize this following step... 
for(i in 1:length(vecOrig[,1])){            
  vecDest[i,]<-grep(1,vecOrig[i,])
}

# Vectorized solution, I had to transpose it for some reason.
vecDest2<-t(apply(vecOrig,1,function(x) grep(1,x)))

不假对假

matOrig<-matrix(0,nrow=18,ncol=3)  # From this one
matDest<-matrix(0,nrow=18,ncol=8)  # To this one.

# We populate the origin matrix. Same thing as the other case. 
for (i in 1:length(matOrig[,1])){         
  matOrig[i,]<-sample(1:8,3,FALSE)
}

# this works, but how to make it vectorized?
for(i in 1:length(matOrig[,1])){          
  for(j in matOrig[i,]){
    matDest[i,j]<-1
  }
}

# Not a clue of how to vectorize this one. 
# The 'model.matrix' solution doesn't look neat.

score 4 · Accepted Answer

矢量化解决方案：

虚拟到非虚拟

vecDest <- t(apply(vecOrig == 1, 1, which))

Not dummy to dummy（回到原来的）

nCol <- 8

vecOrig <- t(apply(vecDest, 1, replace, x = rep(0, nCol), values = 1))

score 0 · Accepted Answer

这可能会为第一部分提供一些内部信息：

#Create example data
set.seed(42)
vecOrig<-matrix(rbinom(20,1,0.2),nrow=5,ncol=4)

     [,1] [,2] [,3] [,4]
[1,]    1    0    0    1
[2,]    1    0    0    1
[3,]    0    0    1    0
[4,]    1    0    0    0
[5,]    0    0    0    0

请注意，这并不假定每行中的个数相等（例如，您写了“最多选择 3 个”）。

#use algebra to create position numbers
vecDest <- t(t(vecOrig)*1:ncol(vecOrig))

     [,1] [,2] [,3] [,4]
[1,]    1    0    0    4
[2,]    1    0    0    4
[3,]    0    0    3    0
[4,]    1    0    0    0
[5,]    0    0    0    0

现在，我们删除零。因此，我们必须将对象变成一个列表。

vecDest <- split(t(vecDest), rep(1:nrow(vecDest), each = ncol(vecDest)))
lapply(vecDest,function(x) x[x>0])

$`1`
[1] 1 4

$`2`
[1] 1 4

$`3`
[1] 3

$`4`
[1] 1

$`5`
numeric(0)

r - 在R中来回虚拟变量

虚拟到非虚拟

不假对假

2 回答 2

Related

Reference