2

我有一个示例数据框“ z”,如下所示:

deaths  sex race    smokes  pyears
10  Female  White   0   1410
14  Male    White   1   1974
14  Female  Black   0   1974
16  Male    Black   1   2256
17  Male    Black   0   2397
18  Female  NA  1   2538
19  NA  Black   0   2679
20  Female  White   1   2820
20  Female  Black   0   2820
21  Male    Black   1   2961

我喜欢创建group结合变量种族和性别的新变量“”。这个新变量唯一地标识了 daaframe "z" 中的观察组。预期的输出是

 group
    1
    2
    3
    4
    4
    6
    5
    1
    3
    4

我想知道我们如何在 R 中编写代码?

4

2 回答 2

2

This is the sort of thing I was thinking:

dat <- read.table(text = "deaths  sex race    smokes  pyears
10  Female  White   0   1410
14  Male    White   1   1974
14  Female  Black   0   1974
16  Male    Black   1   2256
17  Male    Black   0   2397
18  Female  NA  1   2538
19  NA  Black   0   2679
20  Female  White   1   2820
20  Female  Black   0   2820
21  Male    Black   1   2961",header = TRUE,sep = "")

dat$sex <- factor(dat$sex,exclude = NULL)
dat$race <- factor(dat$race,exclude = NULL)

with(dat,interaction(sex,race))

 [1] Female.White Male.White   Female.Black Male.Black   Male.Black   Female.NA    NA.Black     Female.White Female.Black
[10] Male.Black  
Levels: Female.Black Male.Black NA.Black Female.White Male.White NA.White Female.NA Male.NA NA.NA

It looks like you wanted to include the NAs, rather than drop them, hence the explicit factor calls. Obviously, the resulting factor can be converted to integers using as.integer, although the actual numbers won't likely be in the order you specified, since R will order things alphabetically, rather than how they appear in your data frame.

于 2013-02-01T22:58:01.287 回答
1

你可以使用:

dat <- read.table(text="deaths  sex race    smokes  pyears
10  Female  White   0   1410
14  Male    White   1   1974
14  Female  Black   0   1974
16  Male    Black   1   2256
17  Male    Black   0   2397
18  Female  NA  1   2538
19  NA  Black   0   2679
20  Female  White   1   2820
20  Female  Black   0   2820
21  Male    Black   1   2961", header=TRUE)

library(qdap)
factor(paste2(dat[, 2:3], ,FALSE))

#for numeric:
as.numeric(factor(paste2(dat[, 2:3], ,FALSE)))

但正如 Joran 指出的那样,您的数字期望与 R 将如何实现它们不同。您必须根据需要使用levelsinsidefactor来订购级别。

于 2013-02-01T23:08:11.533 回答