0

假设我有这个数据框:

> df1
      date               count
1  2012-07-01           2.867133
2  2012-08-01           2.018745
3  2012-09-01           5.237515
4  2012-10-01           8.320493
5  2012-11-01           4.119850
6  2012-12-01           3.648649
7  2013-01-01           3.172867
8  2013-02-01           4.065041
9  2013-03-01           2.914798
10 2013-04-01           4.735683
11 2013-05-01           3.775411
12 2013-06-01           3.825717
13 2013-07-01           3.273427
14 2013-08-01           2.716469
15 2013-09-01           2.687296
16 2013-10-01           3.674121
17 2013-11-01           3.325942
18 2013-12-01           2.524038

我现在想以这样一种方式拆分df1$count,以便获得信息最高的组/范围。我的想法是信息增益,但我知道 IG 是用于属性,而不是列。如果您绘制数据,您可以区分高涨和减少......所以我的目标是始终找到这些包含高信息增益的显着增加/减少。

关于我如何做到这一点的任何想法?

4

1 回答 1

1

像这样的东西?

df1%>%
  mutate(dif=ifelse((lag(count)-count)>0,0,1))%>%
  mutate(group=rle(dif) %>% magrittr::extract2("lengths") %>% rep(seq_along(.), .))
         date    count dif group
1  2012-07-01 2.867133  NA     1
2  2012-08-01 2.018745   0     2
3  2012-09-01 5.237515   1     3
4  2012-10-01 8.320493   1     3
5  2012-11-01 4.119850   0     4
6  2012-12-01 3.648649   0     4
7  2013-01-01 3.172867   0     4
8  2013-02-01 4.065041   1     5
9  2013-03-01 2.914798   0     6
10 2013-04-01 4.735683   1     7
11 2013-05-01 3.775411   0     8
12 2013-06-01 3.825717   1     9
13 2013-07-01 3.273427   0    10
14 2013-08-01 2.716469   0    10
15 2013-09-01 2.687296   0    10
16 2013-10-01 3.674121   1    11
17 2013-11-01 3.325942   0    12
18 2013-12-01 2.524038   0    12

更新

 df1%>%
   mutate(nxt=lag(count),
     dif=ifelse( abs(count-lag(count))>2 | count/lag(count)>3 | lag(count)/count>3,1,0))%>%
+   mutate(group=rle(dif) %>% magrittr::extract2("lengths") %>% rep(seq_along(.), .))
         date    count      nxt dif group
1  2012-07-01 2.867133       NA  NA     1
2  2012-08-01 2.018745 2.867133   0     2
3  2012-09-01 5.237515 2.018745   1     3
4  2012-10-01 8.320493 5.237515   1     3
5  2012-11-01 4.119850 8.320493   1     3
6  2012-12-01 3.648649 4.119850   0     4
7  2013-01-01 3.172867 3.648649   0     4
8  2013-02-01 4.065041 3.172867   0     4
9  2013-03-01 2.914798 4.065041   0     4
10 2013-04-01 4.735683 2.914798   0     4
11 2013-05-01 3.775411 4.735683   0     4
12 2013-06-01 3.825717 3.775411   0     4
13 2013-07-01 3.273427 3.825717   0     4
14 2013-08-01 2.716469 3.273427   0     4
15 2013-09-01 2.687296 2.716469   0     4
16 2013-10-01 3.674121 2.687296   0     4
17 2013-11-01 3.325942 3.674121   0     4
18 2013-12-01 2.524038 3.325942   0     4
于 2018-10-23T13:56:58.280 回答