我有一个纵向结构的数据框,如下所示:
df = structure(list(oslaua = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("E06000001", "E06000002",
"E06000003", "E06000004"), class = "factor"), wave = structure(c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("0",
"1", "2", "3"), class = "factor"), old.la = structure(c(1L, 1L,
NA, 1L, 2L, 2L, 2L, NA, 3L, 3L, 3L, 3L, 4L, 4L, NA), .Label = c("00EB",
"00EC", "00EE", "00EF"), class = "factor"), la = structure(c(1L,
1L, NA, 1L, 2L, 2L, 2L, NA, 3L, 3L, 3L, 3L, 4L, 4L, NA), .Label = c("Hartlepool UA",
"Middlesbrough UA", "Redcar and Cleveland UA", "Stockton-on-Tees UA"
), class = "factor"), dclg.code = structure(c(1L, 1L, NA, 1L,
4L, 4L, 4L, NA, 3L, 3L, 3L, 3L, 2L, 2L, NA), .Label = c("H0724",
"H0738", "V0728", "W0734"), class = "factor"), novo_entries = c(24L,
4L, 0L, 1L, 35L, 15L, 1L, 0L, 49L, 7L, 2L, 2L, 40L, 14L, 0L)), .Names = c("oslaua",
"wave", "old.la", "la", "dclg.code", "novo_entries"), row.names = c(NA,
15L), class = "data.frame")
我的标识符变量是oslaua
,我的时间变量是wave
。old.la
,la
和dclg.code
是具有 NA 的因子变量。我的目标包括使用与每个标识符 ( )NA
关联的每个变量的级别重新编码 my by 。oslaua
对于使用以下内容的情况,我尝试这样做old.la
:
df = df %>% group_by(oslaua) %>% mutate(old.la.1 = ifelse(is.na(old.la), unique(old.la), old.la)) %>% as.data.frame()
我部分明白了我的目的,但您可以看到一些问题:
> df
oslaua wave old.la la dclg.code novo_entries old.la.1
1 E06000001 0 00EB Hartlepool UA H0724 24 1
2 E06000001 1 00EB Hartlepool UA H0724 4 1
3 E06000001 2 <NA> <NA> <NA> 0 2
4 E06000001 3 00EB Hartlepool UA H0724 1 1
5 E06000002 0 00EC Middlesbrough UA W0734 35 2
6 E06000002 1 00EC Middlesbrough UA W0734 15 2
7 E06000002 2 00EC Middlesbrough UA W0734 1 2
8 E06000002 3 <NA> <NA> <NA> 0 2
9 E06000003 0 00EE Redcar and Cleveland UA V0728 49 3
10 E06000003 1 00EE Redcar and Cleveland UA V0728 7 3
11 E06000003 2 00EE Redcar and Cleveland UA V0728 2 3
12 E06000003 3 00EE Redcar and Cleveland UA V0728 2 3
13 E06000004 0 00EF Stockton-on-Tees UA H0738 40 4
14 E06000004 1 00EF Stockton-on-Tees UA H0738 14 4
15 E06000004 2 <NA> <NA> <NA> 0 4
具体来说,因子的水平改变了它们的格式,并且在某些情况下,观察结果被错误地重新编码(例如oslaua = E06000001
- 第 3 行)
我不明白为什么关卡会改变它们的格式以及如何保持它们原来的(字母数字)格式。另外,为什么有些观察没有正确记录。
任何解决这些问题的建议都非常感谢。
谢谢!