0

我有 1600 多人的传记资料。这些数据包括他们的性别、出生年份、家乡等,以及他们从开始工作那年起的职业轨迹。我正在尝试将其转换为面板数据,以便了解他们的工作场所自从他们开始工作以来发生了怎样的变化。我对这个数据集有以下问题:

1)如何将其转换为面板数据集?我想要的每个人(id)的最佳格式是:

  id gender hometown year job
1  1      1       NY 1990   3
1  1      1       NY 1991   3
1  1      1       NY 1992   3
1  1      1       NY 1993   3
1  1      1       NY 1994   5

2) 如果该人的职位重叠,我如何保存信息?例如,此人可以同时从事工作 3 和工作 5。我希望以后只使用比另一个更高的工作,但同时我想尽可能多地保存信息。

4

1 回答 1

1

好吧,试试这个。

首先选择数据的一个子集。

> (D = head(origin[, c("id", "name1", "gender", "job1", "job1s", "job1e",
            "job2", "job10")]))
  id                name1 gender job1 job1s job1e job2 job10
1  1 Abulaiti Abureduxiti      1 2305  1980  1991 2303    NA
2  2  Aisihaiti Kelimubai      1 2307  1972  1987 2307    NA
3  3          Ai Zhisheng      1 4509  1996  1997 1075 10103
4  4         An Pingsheng      1 3555  1975  1977 3561  2191
5  5            An Zhiwen      1 2063  1977  1979 1127  2507
6  6             An Ziwen      1 4509  1954  1966 4007  2517

接下来,我们将数据重新组织成我认为您所追求的格式。

> library(reshape2)
> (D = melt(D, id.vars = c("id", "name1", "gender")))
   id                name1 gender variable value
1   1 Abulaiti Abureduxiti      1     job1  2305
2   2  Aisihaiti Kelimubai      1     job1  2307
3   3          Ai Zhisheng      1     job1  4509
4   4         An Pingsheng      1     job1  3555
5   5            An Zhiwen      1     job1  2063
6   6             An Ziwen      1     job1  4509
7   1 Abulaiti Abureduxiti      1    job1s  1980
8   2  Aisihaiti Kelimubai      1    job1s  1972
9   3          Ai Zhisheng      1    job1s  1996
10  4         An Pingsheng      1    job1s  1975
11  5            An Zhiwen      1    job1s  1977
12  6             An Ziwen      1    job1s  1954
13  1 Abulaiti Abureduxiti      1    job1e  1991
14  2  Aisihaiti Kelimubai      1    job1e  1987
15  3          Ai Zhisheng      1    job1e  1997
16  4         An Pingsheng      1    job1e  1977
17  5            An Zhiwen      1    job1e  1979
18  6             An Ziwen      1    job1e  1966
19  1 Abulaiti Abureduxiti      1     job2  2303
20  2  Aisihaiti Kelimubai      1     job2  2307
21  3          Ai Zhisheng      1     job2  1075
22  4         An Pingsheng      1     job2  3561
23  5            An Zhiwen      1     job2  1127
24  6             An Ziwen      1     job2  4007
25  1 Abulaiti Abureduxiti      1    job10    NA
26  2  Aisihaiti Kelimubai      1    job10    NA
27  3          Ai Zhisheng      1    job10 10103
28  4         An Pingsheng      1    job10  2191
29  5            An Zhiwen      1    job10  2507
30  6             An Ziwen      1    job10  2517

我们可以看到其中一些记录的工作字段为空,因此我们排除了这些记录。

> (D = D[complete.cases(D),])
   id                name1 gender variable value
1   1 Abulaiti Abureduxiti      1     job1  2305
2   2  Aisihaiti Kelimubai      1     job1  2307
3   3          Ai Zhisheng      1     job1  4509
4   4         An Pingsheng      1     job1  3555
5   5            An Zhiwen      1     job1  2063
6   6             An Ziwen      1     job1  4509
7   1 Abulaiti Abureduxiti      1    job1s  1980
8   2  Aisihaiti Kelimubai      1    job1s  1972
9   3          Ai Zhisheng      1    job1s  1996
10  4         An Pingsheng      1    job1s  1975
11  5            An Zhiwen      1    job1s  1977
12  6             An Ziwen      1    job1s  1954
13  1 Abulaiti Abureduxiti      1    job1e  1991
14  2  Aisihaiti Kelimubai      1    job1e  1987
15  3          Ai Zhisheng      1    job1e  1997
16  4         An Pingsheng      1    job1e  1977
17  5            An Zhiwen      1    job1e  1979
18  6             An Ziwen      1    job1e  1966
19  1 Abulaiti Abureduxiti      1     job2  2303
20  2  Aisihaiti Kelimubai      1     job2  2307
21  3          Ai Zhisheng      1     job2  1075
22  4         An Pingsheng      1     job2  3561
23  5            An Zhiwen      1     job2  1127
24  6             An Ziwen      1     job2  4007
27  3          Ai Zhisheng      1    job10 10103
28  4         An Pingsheng      1    job10  2191
29  5            An Zhiwen      1    job10  2507
30  6             An Ziwen      1    job10  2517

整理重叠位置是次要问题。如果我知道以上基本上是您所追求的,那么我们可以在接下来解决这个问题。

于 2013-12-17T03:46:52.533 回答