r - 当 df 还包含字符串时，将 data.frame 转换为数字矩阵的正确方法？

Question

我有一个从包含数字和字符值的 .csv 文件中获取的数据框。我想将此数据框转换为矩阵。所有包含的信息都是数字（我删除的非数字行），因此应该可以将数据框转换为数字矩阵。但是，我确实得到了一个字符矩阵。

我发现解决这个问题的唯一方法是对as.numeric每一行都使用，但这非常耗时。我很确定有一种方法可以用某种形式做到这一点if(i in 1:n)，但我无法弄清楚它是如何工作的。或者是真正从数值开始的唯一方法，就像这里建议的那样（制作矩阵数字和名称顺序）？

对你们大多数人来说，这可能是一件非常容易的事情：P

矩阵要大得多，这只是前几行......这是代码：

cbind(
as.numeric(SFI.Matrix[ ,1]),
as.numeric(SFI.Matrix[ ,2]),
as.numeric(SFI.Matrix[ ,3]),
as.numeric(SFI.Matrix[ ,4]),
as.numeric(SFI.Matrix[ ,5]),
as.numeric(SFI.Matrix[ ,6]))  

# to get something like this again:

Social.Assistance Danger.Poverty GINI S80S20 Low.Edu        Unemployment 
0.147             0.125          0.34    5.5   0.149        0.135 0.18683691
0.258             0.229          0.27    3.8   0.211        0.175 0.22329362
0.207             0.119          0.22    3.1   0.139        0.163 0.07170422
0.219             0.166          0.25    3.6   0.114        0.163 0.03638525
0.278             0.218          0.29    4.1   0.270        0.198 0.27407825
0.288             0.204          0.26    3.6   0.303        0.211 0.22372633

感谢您的任何帮助！

score 63 · Accepted Answer

编辑 2：见@flodel 的回答。好多了。

尝试：

# assuming SFI is your data.frame
as.matrix(sapply(SFI, as.numeric))

编辑：或@CarlWitthoft 在评论中建议：

matrix(as.numeric(unlist(SFI)),nrow=nrow(SFI))

score 57 · Accepted Answer

data.matrix(SFI)

来自?data.matrix：

Description:

 Return the matrix obtained by converting all the variables in a
 data frame to numeric mode and then binding them together as the
 columns of a matrix.  Factors and ordered factors are replaced by
 their internal codes.

score 9 · Accepted Answer

如果数据框只包含数字，这是另一种方法。

apply(as.matrix.noquote(SFI),2,as.numeric)

但是将数据框转换为矩阵的最可靠方法是使用data.matrix()函数。

score 0 · Accepted Answer

另一种方法是使用read.table()参数colClasses通过 make 指定列类型colClasses=c(*column class types*)。如果有 6 列的成员需要为数字，则需要将字符串重复"numeric"六次，用逗号分隔，导入数据框和as.matrix()数据框。PS 看起来你有标题，所以我把header=T.

as.matrix(read.table(SFI.matrix,header=T,
colClasses=c("numeric","numeric","numeric","numeric","numeric","numeric"),
sep=","))

score 0 · Accepted Answer

我遇到了同样的问题，我通过获取没有行名的原始数据框并稍后添加它们来解决它

SFIo <- as.matrix(apply(SFI[,-1],2,as.numeric))
row.names(SFIo) <- SFI[,1]

score -2 · Accepted Answer

我通过导出 CSV 然后对其进行编辑并重新导入来手动填充 NA，如下所示。

也许你们中的一位专家可能会解释为什么这个过程运行得如此好（第一个文件的列的数据类型为char，INT和num（浮点数）），char在步骤 1 之后它们都变成了类型；但在第 3 步结束时，R 正确识别了每一列的数据类型）。

# STEP 1:
MainOptionFile <- read.csv("XLUopt_XLUstk_v3.csv",
                            header=T, stringsAsFactors=FALSE)
#... STEP 2:
TestFrame <- subset(MainOptionFile, str_locate(option_symbol,"120616P00034000") > 0)
write.csv(TestFrame, file = "TestFrame2.csv")
# ...
# STEP 3:
# I made various amendments to `TestFrame2.csv`, including replacing all missing data cells with appropriate numbers. I then read that amended data frame back into R as follows:    
XLU_34P_16Jun12 <- read.csv("TestFrame2_v2.csv",
                            header=T,stringsAsFactors=FALSE)

回到 R 后，所有列的正确测量水平都被 R 自动识别！

r - 当 df 还包含字符串时，将 data.frame 转换为数字矩阵的正确方法？

6 回答 6

Related

Reference