-1

所以我有一个很长的序列数据集。每列(从 t1 到 t...n)都具有相同的级别或类别。共有200多个类别或级别和144列(变量)。

 id    t1        t2        t3             t...n
"1"   "eating"  "tv"      "conversation" "..."
"2"   "sleep"   "driving" "relaxing"     "..."
"3"   "drawing" "kissing" "knitting"     "..."
"..." "..."     "..."     "..."          "..."

变量 t1 具有相同的级别有 t2 等等。我需要的是对每一列进行循环式重新编码(但避免循环)。

我想避免通常的

seq$t1[seq$t1== "drawing"] <- 'leisure'
seq$t1[seq$t1== "eating"] <- 'meal'
seq$t1[seq$t1== "sleep"] <- 'personal care' 
seq$t1[seq$t1== "..."] <- ... 

最方便的重新编码风格是这样的

c('leisure') = c('drawing', 'tv', ...) 

这将帮助我更好地将变量聚类到更大的类别中。

最近出现的 R 中有一些新的更简单的重新编码方法吗?你会建议我用什么?

这是我的真实数据集的样本,10 个受访者(行)的 5 次重复观察(列)。

dtaSeq = structure(c("Wash and dress", "Eating", "Various arrangements",     "Cleaning dwelling", "Ironing", "Activities related to sports", 
 "Eating", "Eating", "Other specified construction and repairs", 
"Other specified physical care & supervision of a child", "Wash and dress", 
"Filling in the time use diary", "Food preparation", "Wash and dress", 
"Ironing", "Travel related to physical exercise", "Eating", "Eating", 
"Other specified construction and repairs", "Other specified physical care & supervision of a child", 
"Wash and dress", "Filling in the time use diary", "Food preparation", 
"Wash and dress", "Food preparation", "Wash and dress", "Eating", 
"Eating", "Other specified construction and repairs", "Other specified     physical care & supervision of a child", 
"Wash and dress", "Filling in the time use diary", "Baking", 
"Teaching the child", "Food preparation", "Wash and dress", "Eating", 
"Eating", "Other specified construction and repairs", "Other specified physical care & supervision of a child", 
"Dish washing", "Unspecified TV watching", "Reading periodicals", 
"Teaching the child", "Food preparation", "Reading periodicals", 
"Eating", "Eating", "Other specified construction and repairs", 
"Feeding the child", "Laundry", "Unspecified TV watching", "Cleaning dwelling", 
"Teaching the child", "Eating", "Eating", "Eating", "Eating", 
"Other specified construction and repairs", "Feeding the child"), 
.Dim = c(10L, 6L), .Dimnames = list(c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10"), c("act1.050", "act1.051", "act1.052", 
"act1.053", "act1.054", "act1.055")))
4

2 回答 2

2

据我所知,该car包可以在其 -function 中处理字符串或字符recode,但我不确定。另一种方法可能是sjmisc-package,通过将字符串转换为数值并稍后设置值标签来绕道:

library(sjmisc)
dtaSeq <- as.data.frame(dtaSeq)
# convert to values
dtaSeq.values <- to_value(dtaSeq)
# random recode example, use your own values for clustering here
dtaSeq.values <- rec(dtaSeq.values, "1:3=1; 4:6=2; else=3")
# set value labels, these will be added as attributes
dtaSeq.values <- set_val_labels(dtaSeq.values, c("meal", "leisure", "personal care"))
# replace numeric values with assicated label attributes
dtaSeq.values <- to_label(dtaSeq.values)

结果:

> head(dtaSeq.values)
       act1.050      act1.051 act1.052      act1.053      act1.054      act1.055
1 personal care personal care  leisure personal care          meal       leisure
2          meal          meal     meal          meal personal care personal care
3 personal care          meal     meal          meal       leisure          meal
4          meal personal care  leisure personal care personal care       leisure
5       leisure       leisure     meal       leisure       leisure          meal
6          meal personal care  leisure personal care       leisure          meal

sjmisc-recode 函数的一个优点是,如果您有一个具有类似“结构”变量的数据框,您只需调用rec.

这对你有帮助吗?

于 2015-06-09T12:38:38.537 回答
1

您似乎没有为您的真实数据完全指定重新编码规则,所以我做了一些:

recodes <- list("meals"=c("Eating"),
                "leisure"=c("Reading Periodicals",
                             "Unspecified TV watching"),
                "child care"=c("Feeding the child","Teaching the child"),
                "house care"=c("Food preparation","Dish washing",
                                "Cleaning dwelling","Ironing"))

这是一个通用的重新编码功能。 car::recode确实有效,但我觉得它有点笨拙。也有plyr::revalue,但它是一对一的,而不是多对一的。

recodeFun <- function(x) {
    for (i in seq_along(recodes)) {
        x[x %in% recodes[[i]]] <- names(recodes)[i]
           }
           return(x)
}
d2 <- recodeFun(dtaSeq)
于 2015-06-09T14:14:44.430 回答