-1

我没有成功地尝试完成下面描述的任务,所以任何帮助将不胜感激。

下面最大的表格包含了渔民配额所有权的数据(以及其他变量,“cpue”)。我根据渔民拥有的配额数量(“类别”)对渔民进行分类。渔民可以增加或减少自有配额的数量;因此,他们的所有权类别也可能发生变化。每次渔民更改所有权时,我都需要提取信息。是前一年配额数量已经增加或减少的行。例如,如果配额数量在 2000 年和 2001 年分别为 20 和 45,我需要 2000 年的信息(行)。此外,我需要一个带有类别的新列来指示渔民的所有权级别移动。下面的第二个表显示了我需要使用提取的行创建的新数据框。

我的数据:

ID  fisher  year    qtty    category    cpue
1   1   1998    13  1   0.5994452
2   1   1999    13  1   0.6176183
3   1   2000    13  1   0.6871764
4   1   2001    20  2   0.3228005
5   1   2002    20  2   0.6505336
6   1   2003    20  2   0.8615834
7   1   2004    20  2   0.6871764
8   1   2005    20  2   0.7469739
9   1   2006    20  2   0.7380952
10  1   2007    45  3   0.7516396
11  1   2008    45  3   0.6808454
12  1   2009    45  3   0.6734158
13  1   2010    45  3   0.70367
14  1   2011    45  3   0.5434572
15  1   2012    45  3   0.6181238
16  2   2000    50  3   0.5191856
17  2   2001    50  3   0.6098226
18  2   2002    50  3   1.0018519
19  2   2003    50  3   1.2049724
20  2   2004    50  3   0.5857708
21  2   2005    10  1   0.6744186
22  2   2006    10  1   0.8123333
23  2   2007    10  1   0.3228005
24  2   2008    10  1   0.6505336
25  2   2009    10  1   0.8615834
26  2   2010    0   4   0
27  3   1998    25  2   0.7469739
28  3   1999    25  2   0.7380952
29  3   2000    25  2   0.7516396
30  3   2001    25  2   0.6808454
31  3   2002    10  1   0.6734158
32  3   2003    10  1   0.70367
33  3   2004    10  1   0.5434572
34  3   2005    10  1   0.6181238
35  3   2006    45  3   0.4698849
36  3   2007    45  3   1.0714286
37  3   2008    45  3   1.242439
38  3   2009    45  3   1.0614261
39  3   2010    45  3   0.9761391
40  3   2011    45  3   1.0041898
41  3   2012    45  3   0.9429851
42  4   2005    45  3   0.9310958
43  4   2006    50  3   0.8932985
44  4   2007    50  3   0.7867613
45  4   2008    20  2   0.7994713
46  4   2009    20  2   0.9368927
47  4   2010    10  1   0.8123333
48  4   2011    0   4   0
49  5   1998    20  2   0.4698849
50  5   1999    20  2   1.0714286
51  5   2000    20  2   1.242439
52  5   2001    20  2   1.0614261
53  5   2002    20  2   0.9761391
54  5   2003    20  2   1.0041898
55  5   2004    20  2   0.7469739
56  5   2005    0   4   0.7380952
57  6   2000    55  3   0.7516396
58  6   2001    55  3   0.6808454
59  6   2002    55  3   0.6734158
60  6   2003    55  3   0.6505336
61  6   2004    55  3   0.8615834
62  6   2005    55  3   0.6871764
63  6   2006    55  3   0.6181238
64  6   2007    0   4   0

这就是我需要的:

ID  fisher  year    qtty    category    cpue    category2
3   1   2000    13  1   0.6871764   1
25  2   2009    10  1   0.8615834   1
34  3   2005    10  1   0.6181238   1
47  4   2010    10  1   0.8123333   1
9   1   2006    20  2   0.7380952   2
30  3   2001    25  2   0.6808454   3
46  4   2009    20  2   0.9368927   3
44  4   2007    50  3   0.7867613   4
20  2   2004    50  3   0.5857708   5
25  2   2009    10  1   0.8615834   6
47  4   2010    10  1   0.8123333   6
55  5   2004    20  2   0.7469739   7
63  6   2006    55  3   0.6181238   8

所有权类别为 1(1-15 个配额)、2(16-40 个配额)、3(>40 个配额)和 4(0 个配额,退出渔业的人)。我需要的新类别应该显示不同所有权类别之间的转换(例如,类别 1 是从所有权级别 1 到所有权级别 2 的转换)。下表中的完整详细信息:

From    to  category2
1   2   1
2   3   2
2   1   3
3   2   4
3   1   5
1   0   6
2   0   7
3   0   8

谢谢!!

4

3 回答 3

2

作为data您的第一个数据框和cats类别表:

> w<-which(diff(data$fisher)==0 & diff(data$category)!= 0)
> merge(data.frame(data[w,],From=data$category[w],to=data$category[w+1]),cats,all.x=T)[,-(1:2)]
   ID fisher year qtty category      cpue category2
1   3      1 2000   13        1 0.6871764         1
2  34      3 2005   10        1 0.6181238        NA
3  25      2 2009   10        1 0.8615834         6
4  47      4 2010   10        1 0.8123333         6
5  46      4 2009   20        2 0.9368927         3
6  30      3 2001   25        2 0.6808454         3
7   9      1 2006   20        2 0.7380952         2
8  55      5 2004   20        2 0.7469739         7
9  20      2 2004   50        3 0.5857708         5
10 44      4 2007   50        3 0.7867613         4
11 63      6 2006   55        3 0.6181238         8
于 2013-11-02T18:11:54.013 回答
1

如果我正确理解了您的问题,这应该对您有用。df是您在问题中显示的大数据集-

library(data.table)
dt <- data.table(df)
dt[,qttychange := diff(qtty), by = "fisher"]
categorychanges <- dt[qttychange != 0]

dt[,nextcategory := c(tail(category,-1),NA)]
dt[qttychange == 0 ,nextcategory := NA]
categorytable <- dt[!is.na(nextcategory),list(freq = .N), by = c("category","nextcategory")]

输出 -

> categorychanges
    ID fisher year qtty category      cpue qttychange
 1:  3      1 2000   13        1 0.6871764          7
 2:  9      1 2006   20        2 0.7380952         25
 3: 20      2 2004   50        3 0.5857708        -40
 4: 25      2 2009   10        1 0.8615834        -10
 5: 30      3 2001   25        2 0.6808454        -15
 6: 34      3 2005   10        1 0.6181238         35
 7: 42      4 2005   45        3 0.9310958          5
 8: 44      4 2007   50        3 0.7867613        -30
 9: 46      4 2009   20        2 0.9368927        -10
10: 47      4 2010   10        1 0.8123333        -10
11: 48      4 2011    0        4 0.0000000          5
12: 55      5 2004   20        2 0.7469739        -20
13: 63      6 2006   55        3 0.6181238        -55
> categorytable
    category nextcategory freq
 1:        1            2    1
 2:        2            3    1
 3:        3            1    1
 4:        1            4    2
 5:        2            1    2
 6:        1            3    1
 7:        3            3    1
 8:        3            2    1
 9:        4            2    1
10:        2            4    1
11:        3            4    1
于 2013-11-02T16:39:59.413 回答
1

您提供的输出有点不一致,即category2您提供的和category2您的输出之间存在一些重复行和一些不匹配。

此外,显示category2(i) 的最后一个数据框具有0您没有提到的配额类别,(ii) 不提供category21 到 3 的转换。因此,我0 更改为 4,并category2为 1 到 3 的过渡添加了一个。

我希望我没有误解,但结果看起来与您期望的相似:

library(zoo)

newDF <- do.call(rbind, lapply(split(DF, DF$fisher), 
                   function(x) { res <- x[diff(x$category) != 0,] ;
                       aa <- unique(x$category) ; 
                          cbind(res, rollapply(unique(x$category), width = 2, c)) }))

newDF$category2 <- unlist(apply(newDF[,c("1", "2")], 1, 
     function(x) trans$category2[grep(paste(x, collapse = " to "), 
            paste(trans$From, trans$to, sep = " to "))]), use.names = F)

newDF
#     ID fisher year qtty category      cpue 1 2 category2
#1.3   3      1 2000   13        1 0.6871764 1 2         1
#1.9   9      1 2006   20        2 0.7380952 2 3         2
#2.20 20      2 2004   50        3 0.5857708 3 1         5
#2.25 25      2 2009   10        1 0.8615834 1 4         6
#3.30 30      3 2001   25        2 0.6808454 2 1         3
#3.34 34      3 2005   10        1 0.6181238 1 3 not given
#4.44 44      4 2007   50        3 0.7867613 3 2         4
#4.46 46      4 2009   20        2 0.9368927 2 1         3
#4.47 47      4 2010   10        1 0.8123333 1 4         6
#5    55      5 2004   20        2 0.7469739 2 4         7
#6    63      6 2006   55        3 0.6181238 3 4         8

12newDF 是“从 - 到”的过渡。

DF是您的大型数据框,并且trans是您最后一个带有转换的数据框(因为我更改了它):

DF <- structure(list(ID = 1:64, fisher = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), year = c(1998L, 1999L, 
2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 
2009L, 2010L, 2011L, 2012L, 2000L, 2001L, 2002L, 2003L, 2004L, 
2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 1998L, 1999L, 2000L, 
2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 
2010L, 2011L, 2012L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 
2011L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 
2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L), qtty = c(13L, 
13L, 13L, 20L, 20L, 20L, 20L, 20L, 20L, 45L, 45L, 45L, 45L, 45L, 
45L, 50L, 50L, 50L, 50L, 50L, 10L, 10L, 10L, 10L, 10L, 0L, 25L, 
25L, 25L, 25L, 10L, 10L, 10L, 10L, 45L, 45L, 45L, 45L, 45L, 45L, 
45L, 45L, 50L, 50L, 20L, 20L, 10L, 0L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L, 0L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 0L), category = c(1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 4L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 4L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L), 
    cpue = c(0.5994452, 0.6176183, 0.6871764, 0.3228005, 0.6505336, 
    0.8615834, 0.6871764, 0.7469739, 0.7380952, 0.7516396, 0.6808454, 
    0.6734158, 0.70367, 0.5434572, 0.6181238, 0.5191856, 0.6098226, 
    1.0018519, 1.2049724, 0.5857708, 0.6744186, 0.8123333, 0.3228005, 
    0.6505336, 0.8615834, 0, 0.7469739, 0.7380952, 0.7516396, 
    0.6808454, 0.6734158, 0.70367, 0.5434572, 0.6181238, 0.4698849, 
    1.0714286, 1.242439, 1.0614261, 0.9761391, 1.0041898, 0.9429851, 
    0.9310958, 0.8932985, 0.7867613, 0.7994713, 0.9368927, 0.8123333, 
    0, 0.4698849, 1.0714286, 1.242439, 1.0614261, 0.9761391, 
    1.0041898, 0.7469739, 0.7380952, 0.7516396, 0.6808454, 0.6734158, 
    0.6505336, 0.8615834, 0.6871764, 0.6181238, 0)), .Names = c("ID", 
"fisher", "year", "qtty", "category", "cpue"), class = "data.frame", row.names = c(NA, 
-64L))

trans <- structure(list(From = c("1", "2", "2", "3", "3", "1", "2", "3", 
"1"), to = c("2", "3", "1", "2", "1", "4", "4", "4", "3"), category2 = c("1", 
"2", "3", "4", "5", "6", "7", "8", "not given")), .Names = c("From", 
"to", "category2"), row.names = c(NA, 9L), class = "data.frame")
于 2013-11-02T17:50:37.240 回答