0

我正在尝试导入一些数据(如下)并检查我是否有适当的行数以供以后分析。

repexample <- structure(list(QueueName = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c(" Overall", "CCM4.usci_retention_eng", "usci_helpdesk"
), class = "factor"), X8Tile = structure(c(1L, 2L, 3L, 4L, 5L, 
6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L), .Label = c(" Average", "1", "2", "3", "4", "5", "6", "7", 
"8"), class = "factor"), Actual = c(508.1821504, 334.6994838, 
404.9048759, 469.4068667, 489.2800416, 516.5744106, 551.7966176, 
601.5103783, 720.9810622, 262.4622533, 250.2777778, 264.8281938, 
272.2807882, 535.2466968, 278.25, 409.9285714, 511.6635101, 553, 
641, 676.1111111, 778.5517241, 886.3666667), Calls = c(54948L, 
6896L, 8831L, 7825L, 5768L, 7943L, 5796L, 8698L, 3191L, 1220L, 
360L, 454L, 406L, 248L, 11L, 9L, 94L, 1L, 65L, 9L, 29L, 30L), 
Pop = c(41L, 6L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 3L, 1L, 1L, 
1L, 11L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L)), .Names = c("QueueName", 
"X8Tile", "Actual", "Calls", "Pop"), class = "data.frame", row.names = c(NA, 
-22L))

该数据有 5 列,是我通常会导入的一些数据的一个示例(通过 .csv 文件)。如您所见,“ QueueName”列中有三个唯一值。对于“ QueueName”中的每个唯一值,我想检查它是否有 9 行,或者“X8Tile”列中的相应值(Average, 1, 2, 3, 4, 5, 6, 7, 8)。例如,“ QueueName”整体具有所有必要的行,但usci_helpdesk没有。

所以我的首要任务是至少确定“ QueueName”中的一个唯一值是否没有所有必要的行。

我的第二个优先事项是删除与不符合要求的唯一“QueueName”相对应的所有行。

4

2 回答 2

0

使用包中实现的拆分-应用-组合范例可以轻松解决这两个优先级plyr

优先级 1:识别QueueName行数不足的值

require(plyr)

# Make a short table of the number of rows for each unique value of QueueName
rowSummary <- ddply(repexample, .(QueueName), summarise, numRows=length(QueueName))
print(rowSummary)

如果你有很多唯一的值QueueName,你会想要识别不等于 9 的值:

rowSummary[rowSummary$numRows !=9, ] 

优先级 2:消除行数QueueName不足的行

repexample2 <- ddply(repexample, .(QueueName), transform, numRows=length(QueueName))
repexampleEdit <- repexample2[repexample2$numRows ==9, ]
print(repxampleEdit)

(我不太明白'检查它是否有 9 行,或 "X8Tile" 列中的相应值的含义)。repexampleEdit您可以根据需要编辑该行。

于 2013-07-16T18:32:38.310 回答
0

这是一种对数据的排序方式做出一些假设的方法。如果假设不符合,可以对其进行修改(或者您的数据可以重新排序):

## Paste together the values from your "X8tile" column
##   If all is in order, you should have "Average12345678"
##   If anything is missing, you won't....
myMatch <- names(
  which(with(repexample, tapply(X8Tile, QueueName, FUN=function(x) 
    gsub("^\\s+|\\s+$", "", paste(x, collapse = "")))) 
        == "Average12345678"))

## Use that to subset...
repexample[repexample$QueueName %in% myMatch, ]
#                  QueueName   X8Tile   Actual Calls Pop
# 1                  Overall  Average 508.1822 54948  41
# 2                  Overall        1 334.6995  6896   6
# 3                  Overall        2 404.9049  8831   5
# 4                  Overall        3 469.4069  7825   5
# 5                  Overall        4 489.2800  5768   5
# 6                  Overall        5 516.5744  7943   5
# 7                  Overall        6 551.7966  5796   5
# 8                  Overall        7 601.5104  8698   5
# 9                  Overall        8 720.9811  3191   5
# 14 CCM4.usci_retention_eng  Average 535.2467   248  11
# 15 CCM4.usci_retention_eng        1 278.2500    11   2
# 16 CCM4.usci_retention_eng        2 409.9286     9   2
# 17 CCM4.usci_retention_eng        3 511.6635    94   2
# 18 CCM4.usci_retention_eng        4 553.0000     1   1
# 19 CCM4.usci_retention_eng        5 641.0000    65   1
# 20 CCM4.usci_retention_eng        6 676.1111     9   1
# 21 CCM4.usci_retention_eng        7 778.5517    29   1
# 22 CCM4.usci_retention_eng        8 886.3667    30   1

可以使用aggregate+merge和类似的工具采取类似的方法。

于 2013-07-16T18:35:38.693 回答