亲爱的 StackOverflow 社区,
我有一个来自我的大学项目的数据集,我正在尝试解析和运行一些计算。它看起来类似于:
Month,1,2,3,3,4,4,5,6,7
x.1,0,0,0,0,0,0,0,0,0
x.2,0,0,0,0,0,0,0,0,0
x.3,0,0,0,6,5,5,,,15
x.4,0,0,0,7,7,,,,15
x.5,1,1,1,11,7,5,,,0
x.6,1,1,1,14,6,,,,0
x.7,1,1,1,17,5,,,,15
x.8,1,1,1,21,4,,,,15
x.9,0,0,0,1,1,1,1,1,0
x.10,0,0,0,1,1,1,1,1,0
x.11,1,0,0,1,1,1,1,1,0
x.12,0,0,0,0,0,0,0,0,1
x.13,0,0,0,0,0,0,0,0,0
x.14,0,1,0,0,0,0,0,0,0
x.20,orchid,,,orchid,rose,orchid,orchid,orchid,
x.23,0,0,0,1,1,1,1,1,1
x.24,,,,,buttercup,buttercup,buttercup,buttercup,lilac
x.25,0,0,0,1,1,0,1,1,1
x.26,,,,17,,,,,15
x.27,,,,999,,,,,15
然后我尝试像这样导入它:
data <- read.csv("~/data_munging/data.csv", header=F)
my_matrix <- as.matrix(data)
这里的问题是数据集的第一列实际上是变量的名称,而as.matrix()
不是将其读取为行(变量)名称。
(一些数据也有漏洞,但我会留下另一个问题)。
我是 R 新手,想知道我在做什么错™?
更新:
根据贾斯汀的评论,这是导入数据集及其str()
产生的方法:
> sample_data <- read.csv("~/data_munging/sample_data.csv", header=F)
> str(sample_data)
'data.frame': 28 obs. of 10 variables:
$ V1 : Factor w/ 28 levels "Month","x.1","x.10",..: 1 2 13 22 23 24 25 26 27 28 ...
$ V2 : Factor w/ 4 levels "","0","1","orchid": 3 2 2 2 2 3 3 3 3 2 ...
$ V3 : int 2 0 0 0 0 1 1 1 1 0 ...
$ V4 : int 3 0 0 0 0 1 1 1 1 0 ...
$ V5 : Factor w/ 12 levels "","0","1","11",..: 8 2 2 9 10 4 5 6 7 3 ...
$ V6 : Factor w/ 9 levels "","0","1","4",..: 4 2 2 5 7 7 6 5 4 3 ...
$ V7 : Factor w/ 7 levels "","0","1","4",..: 4 2 2 5 1 5 1 1 1 3 ...
$ V8 : Factor w/ 6 levels "","0","1","5",..: 4 2 2 1 1 1 1 1 1 3 ...
$ V9 : Factor w/ 6 levels "","0","1","6",..: 4 2 2 1 1 1 1 1 1 3 ...
$ V10: Factor w/ 6 levels "","0","1","15",..: 5 2 2 4 4 2 2 4 4 2 ...
我认为它应该是一个矩阵的原因是因为它以这种方式读取Month
作为一个因素,它的水平是行名而不是飞蛾(一年中的月份)。
更新 2:现在使用 CSV 中的原始数据集。