我有一个数据框(dtetags.df),其中的日期列有许多重复的日期:
dtetags.df$Date
"2016-07-22" "2016-07-22" "2016-07-21" "2016-07-21" "2016-07-20" "2016-07-20" "2016-07-19" "2016-07-19" "2016-07-18" "2016-07-18" "2016-07-15" "2016-07-15" "2016-07-15" "2016-07-14"
"2016-07-14" "2016-07-13" "2016-07-13" "2016-07-13" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-08" "2016-07-08"
"2016-07-08" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-06" "2016-07-06" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-01" "2016-07-01" "2016-06-30"
"2016-06-30" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-28" "2016-06-28" "2016-06-28" "2016-06-27" "2016-06-27" "2016-06-27" "2016-06-24" "2016-06-24"
"2016-06-23" "2016-06-23" "2016-06-22" "2016-06-22" "2016-06-21" "2016-06-21" "2016-06-20" "2016-06-20" "2016-06-17" "2016-06-17" "2016-06-16" "2016-06-16" "2016-06-15" "2016-06-15"
"2016-06-14" "2016-06-13" "2016-06-13" "2016-06-10" "2016-06-10" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-08" "2016-06-08" "2016-06-07" "2016-06-07" "2016-06-06"
"2016-06-06" "2016-06-06" "2016-06-01" "2016-06-01" "2016-05-29" "2016-05-29" "2016-05-27" "2016-05-27" "2016-05-26" "2016-05-26" "2016-05-25" "2016-05-25" "2016-05-24" "2016-05-23"
"2016-05-23" "2016-05-20"
以及一些二进制标签列,显示在该日期是否使用该标签发布了帖子,例如:
dtetags.df$Technology
"0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "1" "1" "0" "1" "0" "1"
"0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
"0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
我正在尝试ddply(dtetags.df,"Date",numcolwise(sum))
根据这个问题使用,但它返回此错误消息<0 rows> (or 0-length row.names)
。我尝试了许多不同的方法来格式化 ddply 命令,但我无法让它工作。
理想的输出如下所示:
Date Technology
1 2016-07-22 0
2 2016-07-21 0
3 2016-07-20 0
4 2016-07-19 0
5 2016-07-18 0
6 2016-07-15 0
7 2016-07-14 0
8 2016-07-13 0
9 2016-07-12 0
10 2016-07-11 0
11 2016-07-08 0
12 2016-07-07 0
13 2016-07-06 1
14 2016-07-05 0
15 2016-07-01 2
16 2016-06-30 1
17 2016-06-29 1
18 2016-06-28 0
19 2016-06-27 0
20 2016-06-24 1
21 2016-06-23 0
22 2016-06-22 0
23 2016-06-21 0
24 2016-06-20 0
25 2016-06-17 0
26 2016-06-16 0
27 2016-06-15 0
28 2016-06-14 1
29 2016-06-13 0
30 2016-06-10 0
31 2016-06-09 0
32 2016-06-08 0
33 2016-06-07 0
34 2016-06-06 0
35 2016-06-01 0
36 2016-05-29 0
37 2016-05-27 0
38 2016-05-26 0
39 2016-05-25 0
40 2016-05-24 0
41 2016-05-23 0
42 2016-05-20 0
有什么明显的我做错了吗?
从因子到数值的转换
我删除了 Date 列,应用于data.frame(apply(dtetags.df, 2, function(x) as.numeric(as.character(x))))
数据框的其余部分,并重新添加了 Date 列。
dput(dtetags.df)
structure(list(Date = c("2016-07-22", "2016-07-22", "2016-07-21",
"2016-07-21", "2016-07-20", "2016-07-20", "2016-07-19", "2016-07-19",
"2016-07-18", "2016-07-18", "2016-07-15", "2016-07-15", "2016-07-15",
"2016-07-14", "2016-07-14", "2016-07-13", "2016-07-13", "2016-07-13",
"2016-07-12", "2016-07-12", "2016-07-12", "2016-07-12", "2016-07-11",
"2016-07-11", "2016-07-11", "2016-07-11", "2016-07-08", "2016-07-08",
"2016-07-08", "2016-07-07", "2016-07-07", "2016-07-07", "2016-07-07",
"2016-07-06", "2016-07-06", "2016-07-05", "2016-07-05", "2016-07-05",
"2016-07-05", "2016-07-01", "2016-07-01", "2016-06-30", "2016-06-30",
"2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29",
"2016-06-28", "2016-06-28", "2016-06-28", "2016-06-27", "2016-06-27",
"2016-06-27", "2016-06-24", "2016-06-24", "2016-06-23", "2016-06-23",
"2016-06-22", "2016-06-22", "2016-06-21", "2016-06-21", "2016-06-20",
"2016-06-20", "2016-06-17", "2016-06-17", "2016-06-16", "2016-06-16",
"2016-06-15", "2016-06-15", "2016-06-14", "2016-06-13", "2016-06-13",
"2016-06-10", "2016-06-10", "2016-06-09", "2016-06-09", "2016-06-09",
"2016-06-09", "2016-06-08", "2016-06-08", "2016-06-07", "2016-06-07",
"2016-06-06", "2016-06-06", "2016-06-06", "2016-06-01", "2016-06-01",
"2016-05-29", "2016-05-29", "2016-05-27", "2016-05-27", "2016-05-26",
"2016-05-26", "2016-05-25", "2016-05-25", "2016-05-24", "2016-05-23",
"2016-05-23", "2016-05-20"), `Technology` = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("Date",
"Technology"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -100L))