我有一个示例数据框“数据”,如下所示:
X Y Month Year income
2281205 228120 3 2011 1000
2281212 228121 9 2010 1100
2281213 228121 12 2010 900
2281214 228121 3 2011 9000
2281222 228122 6 2010 1111
2281223 228122 9 2010 3000
2281224 228122 12 2010 1889
2281225 228122 3 2011 778
2281243 228124 12 2010 1111
2281244 228124 3 2011 200
2281282 228128 9 2010 7889
2281283 228128 12 2010 2900
2281284 228128 3 2011 3400
2281302 228130 9 2010 1200
2281303 228130 12 2010 2000
2281304 228130 3 2011 1900
2281352 228135 9 2010 2300
2281353 228135 12 2010 1333
2281354 228135 3 2011 2340
我使用 ddply 计算每个Y的收入
x <- ddply(data, .(Y), summarize, freq=length(Y), tot=sum(income))
#Now, I also need to find out the X for each Y depending upon the following conditions:
a. If Y consists of observations of months 9 (2010), 12 (2010), and 3 (2011), then the x corresponds to months 9(2010) i.e. for Y =228121 x=2281212
b. If Y consists of observations of month 6 (2010), 9 (2010), 12(2010) , and 3 (2011) then the x corresponds to months 6 (2010) i.e. for Y =228122 x=2281222.
c. If Y consists of observations of month 12 (2010), 3 (2011) then the x corresponds to months 12 (2010) i.e. for Y =228124 x=2281243.
d. If Y consists of observations of month 12 (2010), 3 (2011) then the x corresponds to months 12 (2010) i.e. for Y =228124 x=2281243.
e. If Y consists of only one observation then the x corresponds to month of that observation i.e. for Y =228120 x=2281205.
这里的要点是,如果我对每个 Y 有多个观察值,我选择对应于第 6 个月(2010 年)的 x(如果可用),但如果不可用,我选择接近 6 月(2010 年)的月份(例如 9( 2010))。请注意,如果我只有一个观察结果,我将选择 x 作为该观察结果。
请建议如何将这些条件纳入 ddply。