r - 将分类均值转移到新表

Question

我对R相当陌生，但我已经解决了比我目前的问题更大的挑战，这使得它特别令人沮丧。我搜索了论坛并找到了一些相关主题，但没有一个可以解决这种情况。

我有一个包含 14 个变量的 184 个观察值的数据集：

> head(diving)
   tagID ddmmyy Hour.GMT. Hour.Local.  X0  X3 X10  X20  X50 X100 X150 X200 X300 X400
1 122097 250912         0           9 0.0 0.0 0.3 12.0 15.3 59.6 12.8  0.0    0    0
2 122097 260912         0           9 0.0 2.4 6.9  5.5 13.7 66.5  5.0  0.0    0    0
3 122097 260912         6          15 0.0 1.9 3.6  4.1 12.7 39.3 34.6  3.8    0    0
4 122097 260912        12          21 0.0 0.2 5.5  8.0 18.1 61.4  6.7  0.0    0    0
5 122097 280912         6          15 2.4 9.3 6.0  3.4  7.6 21.1 50.3  0.0    0    0
6 122097 290912        18           3 0.0 0.2 1.6  6.4 41.4 50.4  0.0  0.0    0    0

这是标记数据，每个日期都有一个或多个 6 小时时间段（由于传输中断而不是连续数据集）。在每个 6 小时的垃圾箱中，动物潜水的深度按百分比分为 10 个垃圾箱。所以 X0 = 0-3m 之间花费的时间百分比，X3 = 3-10m 之间花费的时间百分比，依此类推。

我想为初学者做的是获取每个深度箱中花费的平均时间百分比并绘制它。首先，我做了以下事情：

avg0<-mean(diving$X0)
avg3<-mean(diving$X3)
avg10<-mean(diving$X10)
avg20<-mean(diving$X20)
avg50<-mean(diving$X50)
avg100<-mean(diving$X100)
avg150<-mean(diving$X150)
avg200<-mean(diving$X200)
avg300<-mean(diving$X300)
avg400<-mean(diving$X400)

在这一点上，我不确定如何绘制结果均值，所以我将它们列了一个列表：

divingmeans<-list(avg0, avg3, avg10, avg20, avg50, avg100, avg150, avg200, avg300, avg400)

boxplot(divingmeans) 之类的作品，在 X 轴上提供 1:10，在 y 轴上提供 % 0-30。但是，我更喜欢直方图，以及提供分类 bin 名称（例如 avg3 或 X3）的 x 轴，而不仅仅是 1:10 的排名。

hist() 和 plot() 提供以下内容：

> plot(divingmeans)
Error in xy.coords(x, y, xlabel, ylabel, log) : 
  'x' is a list, but does not have components 'x' and 'y'
> hist(divingmeans)
Error in hist.default(divingmeans) : 'x' must be numeric

我也试过：

> df<-as.data.frame(divingmeans)
> df
  X3.33097826086957 X3.29945652173913 X8.85760869565217 X17.6461956521739 X30.2614130434783
1          3.330978          3.299457          8.857609           17.6462          30.26141
  X29.3565217391304 X6.44510869565217 X0.664130434782609 X0.135869565217391 X0.0016304347826087
1          29.35652          6.445109          0.6641304          0.1358696         0.001630435

和

> df <- data.frame(matrix(unlist(divingmeans), nrow=10, byrow=T))
> df
   matrix.unlist.divingmeans...nrow...10..byrow...T.
1                                        3.330978261
2                                        3.299456522
3                                        8.857608696
4                                       17.646195652
5                                       30.261413043
6                                       29.356521739
7                                        6.445108696
8                                        0.664130435
9                                        0.135869565
10                                       0.001630435

两者都没有提供我正在寻找的那种表。

我知道必须有一个非常基本的解决方案可以将其转换为合适的表格，但我无法为我的生活弄清楚。我希望能够制作一个基本的直方图，显示每个潜水箱平均花费的时间百分比。用于此目的的数据的最佳格式似乎是包含两列的表格：col1=bin（类别；例如 avg50）和 col2=%（数字；在该类别中花费的平均时间百分比）。

您还会注意到数据被分解为不同的时间段；最终，我希望能够按一天中的时间分离出数据，例如，看看平均潜水深度是否在白天/黑夜之间转移，等等。我认为，一旦我完成了最初的代码，我就可以通过选择例如X0[which(Hour.GMT.=="6")]. 对此的提示也将非常受欢迎。

score 2 · Accepted Answer

我认为您会发现处理长格式数据要容易得多。

您可以reshape使用reshape. 我将使用 data.table 来展示如何轻松地按组计算平均值。

library(data.table)
DT <- data.table(diving)

DTlong <- reshape(DT, varying = list(5:14), direction = 'long', 
  times = c(0,3,10,20,50,100,150,200,300,400), 
  v.names = 'time.spent', timevar = 'hours')

timeByHours <- DTlong[,list(mean.time = mean(time.spent)),by=hours]

# you can then plot the two column data.table

plot(timeByHours, type = 'l')

在此处输入图像描述

您现在可以通过深度的日期/小时/时间的任意组合进行分析

score 0 · Accepted Answer

你想如何绘制它们？

# grab the means of each column
diving.means <- colMeans(diving[, -(1:5)])


# plot it
plot(diving.means)

# boxplot
boxplot(diving.means)

如果你想从列名中获取区间的下限，只需去掉 X

lowerIntervalBound <- gsub("X", "", names(diving)[-(1:5)])

# you can convert these to numeric and plot against them 
lowInts <- as.numeric(lowerIntervalBound)
plot(x=lowInts, y=diving.means)

# ... or taking log
plot(x=log(lowInts), y=diving.means)

# ... or as factors (similar to basic plot)
plot(x=factor(lowInts), y=diving.means)

不要将潜水装置放在 a 中list，而是尝试将它们放在 a 中vector（使用c）。

如果你想把它组合成一个data.frame：

data.frame(lowInts, diving.means)

# or adding a row id if needed. 
data.frame(rowid=seq(along=diving.means), lowInts, diving.means)

r - 将分类均值转移到新表

2 回答 2

Related

Reference