0
V1 <- c("Name", "Paul", "Name", "Sarah", NA, NA, NA, NA, "Name", "Carl", NA, NA, "Name", "Alice", "Name", "Rita")
V2 <- c("Name", "Paul", "Name", "Sarah", "Name", "Sarah", "Name", "Sarah", "Name", "Carl", "Name", "Carl", "Name", "Alice", "Name", "Rita")
df <- data.frame(V1, V2)
df

我希望 V1 看起来像 V2。编辑:在原始数据集中,V2 不存在,我在这里创建它以提供一些示例数据。

      V1    V2
1   Name  Name
2   Paul  Paul
3   Name  Name
4  Sarah Sarah
5   <NA>  Name
6   <NA> Sarah
7   <NA>  Name
8   <NA> Sarah
9   Name  Name
10  Carl  Carl
11  <NA>  Name
12  <NA>  Carl
13  Name  Name
14 Alice Alice
15  Name  Name
16  Rita  Rita 

我尝试了以下方法:

#find the positions of missings in V1 
m <- which(is.na(df$V1) == TRUE)
m
[1]  5  6  7  8 11 12

#go to every position and change the value depending on the field that is 2 field above the missing
for (i in m) {
  df$V1[m[i]] <- df$V1[m[i]-2]
}

输出正常,但有故障:

      V1    V2
1   Name  Name
2   Paul  Paul
3   Name  Name
4  Sarah Sarah
5   <NA>  Name
6   <NA> Sarah
7   <NA>  Name
8   <NA> Sarah
9   Name  Name
10  Carl  Carl
11  Name  Name
12  Carl  Carl
13  Name  Name
14 Alice Alice
15  Name  Name
16  Rita  Rita

为什么它适用于其他细胞而不是第一个事件?此外,我试图避免 for 循环,所以如果有更优雅的方法来做到这一点,我很想看到一个!

4

3 回答 3

1

由于您的for循环正在循环m,您可以直接执行

m <- which(is.na(df$V1))
for (i in m) df$V1[i] <- df$V1[i-2]
df

#      V1    V2
#1   Name  Name
#2   Paul  Paul
#3   Name  Name
#4  Sarah Sarah
#5   Name  Name
#6  Sarah Sarah
#7   Name  Name
#8  Sarah Sarah
#9   Name  Name
#10  Carl  Carl
#11  Name  Name
#12  Carl  Carl
#13  Name  Name
#14 Alice Alice
#15  Name  Name
#16  Rita  Rita
于 2019-12-22T14:15:11.157 回答
0

一个选项涉及dplyr并且tidyr可能是:

df %>%
 fill(V1) %>%
 group_by(rleid = with(rle(V1), rep(seq_along(lengths), lengths))) %>%
 mutate(V1 = ifelse(row_number() %% 2 == 0 , "Name", V1)) %>%
 ungroup() %>%
 select(-rleid)

   V1    V2   
   <chr> <chr>
 1 Name  Name 
 2 Paul  Paul 
 3 Name  Name 
 4 Sarah Sarah
 5 Name  Name 
 6 Sarah Sarah
 7 Name  Name 
 8 Sarah Sarah
 9 Name  Name 
10 Carl  Carl 
11 Name  Name 
12 Carl  Carl 
13 Name  Name 
14 Alice Alice
15 Name  Name 
16 Rita  Rita 
于 2019-12-22T14:13:40.570 回答
0

这是一个基本的 R 解决方案,您可以matrix在其中重新表述问题:

df$V2 <- as.vector(t(apply(matrix(df$V1,nrow = 2), 1, function(x) x[!is.na(x)][cumsum(!is.na(x))])))

这样

> df
      V1    V2
1   Name  Name
2   Paul  Paul
3   Name  Name
4  Sarah Sarah
5   <NA>  Name
6   <NA> Sarah
7   <NA>  Name
8   <NA> Sarah
9   Name  Name
10  Carl  Carl
11  <NA>  Name
12  <NA>  Carl
13  Name  Name
14 Alice Alice
15  Name  Name
16  Rita  Rita
于 2019-12-22T14:33:53.647 回答