28

这个问题与标题相似的帖子有关(用相邻值替换 R 向量中的 NA)。我想扫描数据框中的一列并将 NA 替换为相邻单元格中的值。在上述帖子中,解决方案不是用来自相邻向量(例如数据矩阵中的相邻元素)的值替换 NA,而是对固定值进行有条件替换。以下是我的问题的可重现示例:

UNIT <- c(NA,NA, 200, 200, 200, 200, 200, 300, 300, 300,300)
STATUS <-c('ACTIVE','INACTIVE','ACTIVE','ACTIVE','INACTIVE','ACTIVE','INACTIVE','ACTIVE','ACTIVE',
                    'ACTIVE','INACTIVE') 
TERMINATED <- c('1999-07-06' , '2008-12-05' , '2000-08-18' , '2000-08-18' ,'2000-08-18' ,'2008-08-18',
                        '2008-08-18','2006-09-19','2006-09-19' ,'2006-09-19' ,'1999-03-15') 
START <- c('2007-04-23','2008-12-06','2004-06-01','2007-02-01','2008-04-19','2010-11-29','2010-12-30',
                   '2007-10-29','2008-02-05','2008-06-30','2009-02-07')
STOP <- c('2008-12-05','4712-12-31','2007-01-31','2008-04-18','2010-11-28','2010-12-29','4712-12-31',
                  '2008-02-04','2008-06-29','2009-02-06','4712-12-31')
#creating dataframe
TEST <- data.frame(UNIT,STATUS,TERMINATED,START,STOP); TEST                   

  UNIT   STATUS TERMINATED      START       STOP
1    NA   ACTIVE 1999-07-06 2007-04-23 2008-12-05
2    NA INACTIVE 2008-12-05 2008-12-06 4712-12-31
3   200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
4   200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
5   200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6   200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
7   200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8   300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
9   300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
10  300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
11  300 INACTIVE 1999-03-15 2009-02-07 4712-12-31

#using the syntax for a conditional replace and hoping it works :/          
TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS; TEST 

   UNIT   STATUS TERMINATED      START       STOP
1     1   ACTIVE 1999-07-06 2007-04-23 2008-12-05
2     2 INACTIVE 2008-12-05 2008-12-06 4712-12-31
3   200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
4   200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
5   200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6   200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
7   200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8   300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
9   300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
10  300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
11  300 INACTIVE 1999-03-15 2009-02-07 4712-12-31

结果应该是:

      UNIT   STATUS TERMINATED      START       STOP
1   ACTIVE   ACTIVE 1999-07-06 2007-04-23 2008-12-05
2 INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31
3      200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
4      200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
5      200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6      200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
7      200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8      300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
9      300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
10     300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
11     300 INACTIVE 1999-03-15 2009-02-07 4712-12-31
4

3 回答 3

34

它不起作用,因为地位是一个因素。当您将因子与数字混合时,数字的限制最少。通过强制状态为字符,您可以获得您所追求的结果,并且该列现在是一个字符向量:

TEST$UNIT[is.na(TEST$UNIT)] <- as.character(TEST$STATUS[is.na(TEST$UNIT)])

##        UNIT   STATUS TERMINATED      START       STOP
## 1    ACTIVE   ACTIVE 1999-07-06 2007-04-23 2008-12-05
## 2  INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31
## 3       200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
## 4       200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
## 5       200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
## 6       200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
## 7       200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
## 8       300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
## 9       300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
## 10      300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
## 11      300 INACTIVE 1999-03-15 2009-02-07 4712-12-31
于 2013-03-26T05:24:03.643 回答
17

你所要做的

TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS[is.na(TEST$UNIT)]

以便该值将替换为相邻的值。否则,要替换的值的数量与替换它们的值之间存在不匹配。这将导致按行顺序替换值。它在这种情况下有效,因为被替换的两个值是前两个值。

于 2016-08-31T15:28:31.417 回答
2
TEST$UNIT = ifelse(is.na(TEST$UNIT), paste(TEST$STATUS),paste(TEST$UNIT));TEST
       UNIT   STATUS TERMINATED      START       STOP
1    ACTIVE   ACTIVE 1999-07-06 2007-04-23 2008-12-05
2  INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31
3       200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
4       200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
5       200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6       200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
7       200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8       300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
9       300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
10      300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
11      300 INACTIVE 1999-03-15 2009-02-07 4712-12-31
于 2019-09-13T07:41:20.583 回答