我想确定在规定时间范围内获得苹果的独特人。我通过如下创建二进制指标“apples”来做到这一点。
names<-c("tom", "mary", "tom", "john", "mary", "tom", "john", "mary", "john", "mary", "tom", "mary", "john", "john")
dates<-as.Date(c("2010-02-01", "2010-05-01", "2010-03-01", "2010-07-01", "2010-07-01", "2010-06-01", "2010-09-01", "2010-07-01", "2010-11-01", "2010-09-01", "2010-08-01", "2010-11-01", "2010-12-01", "2011-01-01"))
fruit<-as.character(c("apple", "orange", "banana", "kiwi", "apple", "apple", "apple", "orange", "banana", "apple", "kiwi", "apple", "orange", "apple"))
age<-as.numeric(c(60,55,60,57,55,60,57,55,57,55,60,55, 57,57))
sex<-as.character(c("m","f","m","m","f","m","m", "f","m","f","m","f","m", "m"))
df<-data.frame(names,dates, age, sex, fruit)
df
df$apples<-ifelse(df$fruit=='apple' & df$dates>="2010-04-01" & df$dates<"2010-10-01",1,0)
df
names dates age sex fruit apples
1 tom 2010-02-01 60 m apple 0
2 mary 2010-05-01 55 f orange 0
3 tom 2010-03-01 60 m banana 0
4 john 2010-07-01 57 m kiwi 0
5 mary 2010-07-01 55 f apple 1
6 tom 2010-06-01 60 m apple 1
7 john 2010-09-01 57 m apple 1
8 mary 2010-07-01 55 f orange 0
9 john 2010-11-01 57 m banana 0
10 mary 2010-09-01 55 f apple 1
11 tom 2010-08-01 60 m kiwi 0
12 mary 2010-11-01 55 f apple 0
13 john 2010-12-01 57 m orange 0
14 john 2011-01-01 57 m apple 0
我的问题是玛丽在那里两次。我只想要她在指定时间范围内得到苹果的第一个日期(以及其他所有人在真实数据中的第一个日期)。我想要一个名为“apples1”的第二列,它标记每个人在定义的时间范围内获得苹果的初始日期。
期望的输出:
names dates age sex fruit apples apples1
1 tom 2010-02-01 60 m apple 0 0
2 mary 2010-05-01 55 f orange 0 0
3 tom 2010-03-01 60 m banana 0 0
4 john 2010-07-01 57 m kiwi 0 0
5 mary 2010-07-01 55 f apple 1 1
6 tom 2010-06-01 60 m apple 1 1
7 john 2010-09-01 57 m apple 1 1
8 mary 2010-07-01 55 f orange 0 0
9 john 2010-11-01 57 m banana 0 0
10 mary 2010-09-01 55 f apple 1 0
11 tom 2010-08-01 60 m kiwi 0 0
12 mary 2010-11-01 55 f apple 0 0
13 john 2010-12-01 57 m orange 0 0
14 john 2011-01-01 57 m apple 0 0
我一直在搜索,最接近的是 -仅选择 R 中列的每个唯一值的第一行。但这并不能解决唯一 ID。我也遇到过!重复,但我不想删除玛丽的数据,因为我需要她的日期来跟进她。我可能在这里遗漏了一些非常基本的东西,提前道歉。