0
Attribute             Time       Value
pmEulRlcUserPacketThp 2013-04-30 12,51,34,17 
pmEulRlcUserPacketThp 2013-04-30 84,28,17,10 
pmEulRlcUserPacketThp 2013-04-30 11,43,28,15
pmEulRlcUserPacketThp 2013-04-30 80,26,17,91 
pmEulRlcUserPacketThp 2013-04-30 10,41,25,13 
pmEulRlcUserPacketThp 2013-04-30 97,35,23,12

我在一家公司实习!他们有这样的数据来做KS测试。值列是一个值数组,但 R 将其读取为字符。我想计算AttributeispmEulRlcUserPacketThpTimeis时所有值的总和2013-04-30。我怎样才能做到这一点?

Attribute包含各种 pm... 并且是从到Time的月度数据。所以我应该为每个日期都有一个向量。请帮我解决这个问题..!30-4-201330-5-2013Attribute

它不适用于行中不同的向量长度

df = read.table(text="Attribute             Time       Value
    pmEulRlcUserPacketThp 2013-04-30 12,51,34,17 
    pmEulRlcUserPacketThp 2013-04-30 84,28,17,10 
    pmEulRlcUserPacketThp 2013-04-30 11,43,28,15
    pmEulRlcUserPacketThp 2013-04-30 80,26,17,91 
    pmEulRlcUserPacketThp 2013-04-30 10,41,25,13 
    pmEulRlcUserPacketThp 2013-04-30 97,35,23,12,13", 
                 header = TRUE, fill = TRUE, stringsAsFactors=F)
dfL <- concat.split.multiple(df, "Value", direction = "long")

"Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 6, 7" 

这是我得到的错误!包含不同长度向量的数据可以做什么?

对于不同的日期:

df = read.table(text="Attribute Time Value
 pmEulRlcUserPacketThp 2013-04-30 12,51,34,17
 pmEulRlcUserPacketThp 2013-04-29 84,28,17,10
 pmEulRlcUserPacketThp 2013-04-28 11,43,28,15
 pmEulRlcUserPacketThp 2013-04-27 80,26,17,91
 pmEulRlcUserPacketThp 2013-04-26 10,41,25,13
 pmEulRlcUserPacketThp 2013-04-25 97,35,23,12",
                 header = TRUE, fill = TRUE, stringsAsFactors=F) 

现在我的数据看起来像这样。我已经完成了所有连接步骤 - 我现在拥有的数据是

> y
              Attribute       Time V1 V2 V3 V4
1 pmEulRlcUserPacketThp 2013-04-30 12 51 34 17
2 pmEulRlcUserPacketThp 2013-04-29 84 28 17 10
3 pmEulRlcUserPacketThp 2013-04-28 11 43 28 15
4 pmEulRlcUserPacketThp 2013-04-27 80 26 17 91
5 pmEulRlcUserPacketThp 2013-04-26 10 41 25 13
6 pmEulRlcUserPacketThp 2013-04-25 97 35 23 12

现在我想要的是,V1、V2、V3、V4 两个时间段的聚合——一个是(27 日到 30 日,另一个是 25 日到 26 日)。我正在使用子集,这对于包含大量元素的大量数据是不可行的。

> y1<-y[1:4,]
> y1
              Attribute       Time V1 V2 V3 V4
1 pmEulRlcUserPacketThp 2013-04-30 12 51 34 17
2 pmEulRlcUserPacketThp 2013-04-29 84 28 17 10
3 pmEulRlcUserPacketThp 2013-04-28 11 43 28 15
4 pmEulRlcUserPacketThp 2013-04-27 80 26 17 91

> y2<-y[-(1:4),]
> y2
              Attribute       Time V1 V2 V3 V4
5 pmEulRlcUserPacketThp 2013-04-26 10 41 25 13
6 pmEulRlcUserPacketThp 2013-04-25 97 35 23 12

> z1<-aggregate(V1 ~ Attribute, y1, sum)
> z1
              Attribute  V1
1 pmEulRlcUserPacketThp 187
> z2<-aggregate(V1 ~ Attribute, y2, sum)
> z2
              Attribute  V1
1 pmEulRlcUserPacketThp 107

这仅适用于两个不同时间间隔的 V1。对于其他值也必须这样做(V2、V3、V4)。这很耗时。有没有办法使用聚合来选择日期?

4

2 回答 2

3

自从最初提出问题以来,您的问题确实发生了很大变化,这通常不是很好的 SO 行为。但是,我觉得很慷慨……

该解决方案使用concat.split.multiple“splitstackshape”和aggregate基础cutR 来获得您想要的解决方案:

加载“splitstackshape”并确保它至少是版本1.2.0(发布时的最新版本):

library(splitstackshape)
## Make sure you're running at least version 1.2.0
packageVersion("splitstackshape")
# [1] ‘1.2.0’

这是您的示例数据:

df <- read.table(text="Attribute Time Value
 pmEulRlcUserPacketThp 2013-04-30 12,51,34,17
 pmEulRlcUserPacketThp 2013-04-29 84,28,17,10
 pmEulRlcUserPacketThp 2013-04-28 11,43,28,15
 pmEulRlcUserPacketThp 2013-04-27 80,26,17,91
 pmEulRlcUserPacketThp 2013-04-26 10,41,25,13
 pmEulRlcUserPacketThp 2013-04-25 97,35,23,12",
 header = TRUE, fill = TRUE, stringsAsFactors = FALSE)

首先,拆分“值”列。

y <- concat.split.multiple(df, "Value")

接下来,为您要使用的日期范围创建一个“间隔”列。

y$interval <- cut(as.Date(y$Time), breaks=c(as.Date(
  c("2013-04-25", "2013-04-27", "2013-04-30"))), include.lowest=TRUE)

最后,aggregate你的数据。该. ~表示法允许您一次聚合所有非 ID 列。

aggregate(. ~ Attribute + interval, y[-2], sum)
#               Attribute   interval Value_1 Value_2 Value_3 Value_4
# 1 pmEulRlcUserPacketThp 2013-04-25     107      76      48      25
# 2 pmEulRlcUserPacketThp 2013-04-27     187     148      96     133

仅供参考:如果您使用的是 1.2.0 版,则不应再收到您在帖子中提到的错误。该错误是由于如何read.table决定要创建多少列。它只读取前 5 行,而您遇到问题的示例在第六行有较长的行。我已经实施count.fields以克服这个问题。感谢您引起我的注意。

于 2013-08-15T10:07:53.200 回答
0

这是否接近你想要的?

df = read.table(text="Attribute             Time       Value
  pmEulRlcUserPacketThp 2013-04-30 12,51,34,17 
  pmEulRlcUserPacketThp 2013-04-30 84,28,17,10 
  pmEulRlcUserPacketThp 2013-04-30 11,43,28,15
  pmEulRlcUserPacketThp 2013-04-30 80,26,17,91 
  pmEulRlcUserPacketThp 2013-04-30 10,41,25,13 
  pmEulRlcUserPacketThp 2013-04-30 97,35,23,12", 
                header = TRUE, fill = TRUE, stringsAsFactors=F)


values = data.frame(t(matrix(unlist(strsplit(df$Value, ',')), ncol = nrow(df))))
values = mapply(values, FUN = function(row){as.numeric(as.character(row))})
df = cbind(df[,1:2], values)

aggregate(df$X1, by=list(df$Attribute, df$Time), FUN=sum)
aggregate(df$X2, by=list(df$Attribute, df$Time), FUN=sum)
于 2013-08-15T10:00:09.467 回答