2

全部,

我有一个复杂的问题,即在 R 中合并两种不同类型的数据。我正在使用定向的 dyad-year 数据框架(A 与 B,B 与 A)。我想按以下方式从国家/地区年份数据集中读取或合并数据。

假设x国家年数据集 ( CY) 中的变量是我试图合并到有向二元年数据集 ( DDY) 中的感兴趣变量。在三年(1990-1992)期间只有四个横截面单元(A、B、C、D)的简化版本中,它看起来像这样。

country     year      x
  A         1990    6.2352
  A         1991    7.2342
  A         1992    8.3902
  B         1990    2.2342
  B         1991    5.1292
  B         1992    1.0001
  C         1990    4.1202
  C         1991    9.1202
  C         1992    1.2011
  D         1990    1.2910
  D         1991    5.0001
  D         1992    2.1111

我正在研究定向 dyad-year 数据集 ( DDY),它已经有许多其他感兴趣的变量。基本上,我想x从国家年数据中获取CY并创建x1和,x2在有向二元年数据集中与给定年份的相应值DDY匹配,并对国家年数据中的变量做同样的事情。x1xx2x

简而言之,我想DDY看起来像这样。

country1     country2     year     x1          x2
   A           B          1990    6.2352     2.2342
   A           B          1991    7.2342     5.1292
   A           B          1992    8.3902     1.0001
   A           C          1990    6.2352     4.1202
   A           C          1991    7.2342     9.1202
   A           C          1992    8.3902     1.2011
   A           D          1990    6.2352     1.2910
   A           D          1991    7.2342     5.0001
   A           D          1992    8.3902     2.1111
   B           A          1990    2.2342     6.2352
   B           A          1991    5.1292     7.2342
   B           A          1992    1.0001     8.3902
   ...

对于每个有向的 dyad-year 配对,数据从那里继续。

我不知道这是否是使用merge命令的微妙过程,或者其他路线是否最合适。任何输入将不胜感激,如果有助于找到解决方案,我将提供有关我正在使用的数据的任何说明。

这个先前提出的问题显然是相关的。但是,由于在提出问题时没有提供可重现的代码,所以对于我想要做的事情来说,答案似乎有点迟钝。如果该解决方案是可行的方法,那么澄清它在做什么可能会有所帮助。

谢谢。

可重现的代码如下。

country <- c("A", "A", "A", "B", "B", "B", "C", "C", "C", "D", "D", "D")
year <- c(1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992)
x <- c(6.2352, 7.2342, 8.3902, 2.2342, 5.1292, 1.0001, 4.1202, 9.1202, 1.2011, 1.2910, 5.0001, 2.1111)
CY <- data.frame(country=country, year=year, x=x)
CY

country1 <- c("A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", "C", "C", "D", "D", "D", "D", "D", "D", "D", "D", "D")
country2 <- c("B", "B", "B", "C", "C", "C", "D", "D", "D", "A", "A", "A", "C", "C", "C", "D", "D", "D", "A", "A", "A", "B", "B", "B", "D", "D", "D", "A", "A", "A", "B", "B", "B", "C", "C", "C")
year <- c(1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992)
DDY <- data.frame(country1=country1, country2=country2, year=year)
DDY
4

2 回答 2

2

这是从 CY 创建 DDY 而不求助于 SQL 合成器的替代方法。

ind  <- expand.grid(1:nrow(CY), 1:nrow(CY))
CY.1 <- CY[ind[, 1], ]
CY.2 <- CY[ind[, 2], ]
bool <- (CY.1$year == CY.2$year) & (CY.1$country != CY.2$country)
DDY  <- data.frame(country1 = CY.1$country[bool], 
                   country2 = CY.2$country[bool],
                   year     = CY.1$year[bool],
                   x1       = CY.1$x[bool],
                   x2       = CY.2$x[bool])
DDY  <- DDY[order(country1, country2), ]
DDY
于 2013-06-25T01:19:12.950 回答
1

1. 仅 CY这只能使用这样的方式来完成CY

library(sqldf)

sqldf("select A.country country1, B.country country2, year, A.x x1, B.x x2 
   from CY A join CY B using (year) 
   where A.country != B.country 
   order by A.country, B.country")

这使:

   country1 country2 year     x1     x2
1         A        B 1990 6.2352 2.2342
2         A        B 1991 7.2342 5.1292
3         A        B 1992 8.3902 1.0001
4         A        C 1990 6.2352 4.1202
5         A        C 1991 7.2342 9.1202
6         A        C 1992 8.3902 1.2011
7         A        D 1990 6.2352 1.2910
8         A        D 1991 7.2342 5.0001
9         A        D 1992 8.3902 2.1111
10        B        A 1990 2.2342 6.2352
11        B        A 1991 5.1292 7.2342
12        B        A 1992 1.0001 8.3902
13        B        C 1990 2.2342 4.1202
14        B        C 1991 5.1292 9.1202
15        B        C 1992 1.0001 1.2011
16        B        D 1990 2.2342 1.2910
17        B        D 1991 5.1292 5.0001
18        B        D 1992 1.0001 2.1111
19        C        A 1990 4.1202 6.2352
20        C        A 1991 9.1202 7.2342
21        C        A 1992 1.2011 8.3902
22        C        B 1990 4.1202 2.2342
23        C        B 1991 9.1202 5.1292
24        C        B 1992 1.2011 1.0001
25        C        D 1990 4.1202 1.2910
26        C        D 1991 9.1202 5.0001
27        C        D 1992 1.2011 2.1111
28        D        A 1990 1.2910 6.2352
29        D        A 1991 5.0001 7.2342
30        D        A 1992 2.1111 8.3902
31        D        B 1990 1.2910 2.2342
32        D        B 1991 5.0001 5.1292
33        D        B 1992 2.1111 1.0001
34        D        C 1990 1.2910 4.1202
35        D        C 1991 5.0001 9.1202
36        D        C 1992 2.1111 1.2011

2. CY 和 DDY

或者,合并CY试试DDY这个:

sqldf("select A.country country1, B.country country2, A.year, A.x x1, B.x x2 
   from DDY join CY A join CY B 
   on DDY.country1 = A.country and DDY.year = A.year 
   and DDY.country2 = B.country and DDY.year = B.year
   order by A.country, B.country")

这给出了这个:

   country1 country2 year     x1     x2
1         A        B 1990 6.2352 2.2342
2         A        B 1991 7.2342 5.1292
3         A        B 1992 8.3902 1.0001
4         A        C 1990 6.2352 4.1202
5         A        C 1991 7.2342 9.1202
6         A        C 1992 8.3902 1.2011
7         A        D 1990 6.2352 1.2910
8         A        D 1991 7.2342 5.0001
9         A        D 1992 8.3902 2.1111
10        B        A 1990 2.2342 6.2352
11        B        A 1991 5.1292 7.2342
12        B        A 1992 1.0001 8.3902
13        B        C 1990 2.2342 4.1202
14        B        C 1991 5.1292 9.1202
15        B        C 1992 1.0001 1.2011
16        B        D 1990 2.2342 1.2910
17        B        D 1991 5.1292 5.0001
18        B        D 1992 1.0001 2.1111
19        C        A 1990 4.1202 6.2352
20        C        A 1991 9.1202 7.2342
21        C        A 1992 1.2011 8.3902
22        C        B 1990 4.1202 2.2342
23        C        B 1991 9.1202 5.1292
24        C        B 1992 1.2011 1.0001
25        C        D 1990 4.1202 1.2910
26        C        D 1991 9.1202 5.0001
27        C        D 1992 1.2011 2.1111
28        D        A 1990 1.2910 6.2352
29        D        A 1991 5.0001 7.2342
30        D        A 1992 2.1111 8.3902
31        D        B 1990 1.2910 2.2342
32        D        B 1991 5.0001 5.1292
33        D        B 1992 2.1111 1.0001
34        D        C 1990 1.2910 4.1202
35        D        C 1991 5.0001 9.1202
36        D        C 1992 2.1111 1.2011

更新:添加了同时使用CY和的解决方案DDY

于 2013-06-25T00:43:24.390 回答