我对 R 很陌生,我的问题如下:
我有一组像这样按时间序列组织的面板数据(仅显示部分):
Week_Starting Team A Team B Team C Team D
2010-01-02 1 2 3 4
2010-01-09 2 40 1 5
2010-01-16 15 <NA> 4 11
2010-01-23 25 <NA> 7 18
2010-01-30 38 <NA> 9 29
2010-02-06 <NA> <NA> 12 34
2010-02-13 <NA> <NA> 16 40
2010-02-20 <NA> <NA> 20 <NA>
2010-02-27 <NA> <NA> 15 28
2010-03-06 <NA> <NA> 20 <NA>
2010-03-13 <NA> <NA> 24 <NA>
2010-03-20 <NA> <NA> 24 <NA>
2010-03-27 <NA> <NA> 21 <NA>
2010-04-03 <NA> <NA> 27 <NA>
2010-04-10 <NA> <NA> 24 <NA>
2010-04-17 <NA> <NA> 25 <NA>
2010-04-24 <NA> <NA> 35 <NA>
2010-05-01 <NA> <NA> 40 <NA>
2010-05-08 <NA> <NA> 32 <NA>
2010-05-15 <NA> <NA> <NA> <NA>
2010-05-22 <NA> <NA> 39 <NA>
例如,使用 B 组是没有意义的,因为有太多的观察缺失。排名系统不提供排名低于 40 的数据。所以我想通过删除没有至少 8 周连续观察的列(变量)来清理(例如本例中的团队 A、B 和 D)。因此 D 不符合要求,因为从 2010-02-20 开始的一周有间隔。请记住,我有超过 1000 列。
我以前试过这个,但它没有给我想要的东西,不幸的是我不够熟练,无法修改代码以满足我的需要。
我能想到的一些可能的解决方案:
子集每个变量的具有 8 个或更多连续观测值的部分
如果连续运行 8 个 obs 包含 NA,则设置观察值 = NA,然后删除只有 NA 的列,因为不满足最少 8 周要求的列将只有 NA 值(我希望你明白我的意思)
只是出于兴趣,如果数据以长格式组织,做同样的事情会更困难吗?
#Using MrFlick's data frame
melt(dd,id="Week_Starting")
Week_Starting variable value
1 2010-01-02 Team_A 1
2 2010-01-09 Team_A 2
3 2010-01-16 Team_A 15
4 2010-01-23 Team_A 25
5 2010-01-30 Team_A 38
6 2010-02-06 Team_A NA
7 2010-02-13 Team_A NA
8 2010-02-20 Team_A NA
9 2010-02-27 Team_A NA
10 2010-03-06 Team_A NA
11 2010-03-13 Team_A NA
12 2010-03-20 Team_A NA
13 2010-03-27 Team_A NA
14 2010-04-03 Team_A NA
15 2010-04-10 Team_A NA
16 2010-04-17 Team_A NA
17 2010-04-24 Team_A NA
18 2010-05-01 Team_A NA
19 2010-05-08 Team_A NA
20 2010-05-15 Team_A NA
21 2010-05-22 Team_A NA
22 2010-01-02 Team_B 2
23 2010-01-09 Team_B 40
24 2010-01-16 Team_B NA
25 2010-01-23 Team_B NA
26 2010-01-30 Team_B NA
27 2010-02-06 Team_B NA
28 2010-02-13 Team_B NA
29 2010-02-20 Team_B NA
30 2010-02-27 Team_B NA
31 2010-03-06 Team_B NA
32 2010-03-13 Team_B NA
33 2010-03-20 Team_B NA
34 2010-03-27 Team_B NA
35 2010-04-03 Team_B NA
36 2010-04-10 Team_B NA
37 2010-04-17 Team_B NA
38 2010-04-24 Team_B NA
39 2010-05-01 Team_B NA
40 2010-05-08 Team_B NA
41 2010-05-15 Team_B NA
42 2010-05-22 Team_B NA
43 2010-01-02 Team_C 3
44 2010-01-09 Team_C 1
45 2010-01-16 Team_C 4
46 2010-01-23 Team_C 7
47 2010-01-30 Team_C 9
48 2010-02-06 Team_C 12
49 2010-02-13 Team_C 16
50 2010-02-20 Team_C 20
51 2010-02-27 Team_C 15
52 2010-03-06 Team_C 20
53 2010-03-13 Team_C 24
54 2010-03-20 Team_C 24
55 2010-03-27 Team_C 21
56 2010-04-03 Team_C 27
57 2010-04-10 Team_C 24
58 2010-04-17 Team_C 25
59 2010-04-24 Team_C 35
60 2010-05-01 Team_C 40
61 2010-05-08 Team_C 32
62 2010-05-15 Team_C NA
63 2010-05-22 Team_C 39
64 2010-01-02 Team_D 4
65 2010-01-09 Team_D 5
66 2010-01-16 Team_D 11
67 2010-01-23 Team_D 18
68 2010-01-30 Team_D 29
69 2010-02-06 Team_D 34
70 2010-02-13 Team_D 40
71 2010-02-20 Team_D NA
72 2010-02-27 Team_D 28
73 2010-03-06 Team_D NA
74 2010-03-13 Team_D NA
75 2010-03-20 Team_D NA
76 2010-03-27 Team_D NA
77 2010-04-03 Team_D NA
78 2010-04-10 Team_D NA
79 2010-04-17 Team_D NA
80 2010-04-24 Team_D NA
81 2010-05-01 Team_D NA
82 2010-05-08 Team_D NA
83 2010-05-15 Team_D NA
84 2010-05-22 Team_D NA
有什么建议么?