0

我有一个列格式类似于此的 CSV:

Section | ID | Totaltime | Item1/Word | Item1/Cat | Item1/Time...Item235/Time  

我想重新调整它,以便每个 ID 不是单行上的所有 235 个条目,而是每个项目有一行,按 ID 排序/分块,所以它看起来类似于 -

Section | ID0 | Totaltime | Item1/Word | Item1/Cat | Item1/Time 
                            Item2/Word | Item2/Cat | Item2/Time
                            Item3/Word | Item3/Cat | Item3/Time
                           ...Item235/Word | Item235/Cat | Item235/Time
Section | ID1 | Totaltime | Item1/Word | Item1/Cat | Item1/Time...

我尝试使用 ID 作为 vars.id 参数来融化它,并将各种项目与 grepl 一起拉到 measure.vars 参数中,但这会导致类似这样的结果 -

Section | ID0 | Totaltime
Section | ID0 | Item1/Word 
Section | ID0 | Item1/Cat 
Section | ID0 | Item1/Time 
             ...
Section | ID0 | Item235/Word 
Section | ID0 | Item235/Cat 
Section | ID0 | Item235/Time

我也尝试过重铸这个,但没有太多运气。

本周我是 R 新手,所以我确信我可能遗漏了一些非常明显的东西,但我在这方面遇到了困难。

4

3 回答 3

1

meltdata.table v1.9.5+可以对多列进行操作。(使用@rawr 的数据)

require(data.table) # v1.9.5+
vals = unique(gsub("Item[0-9]+/", "", tail(names(dd), -3L)))
melt(setDT(dd), id=1:3, measure=lapply(vals, grep, names(dd)), value.name=vals)
#     Section   ID0 Totaltime variable   Word   Cat   Time
#  1:       1 10001       100        1 1/word 1/cat 1/time
#  2:       2 10002       200        1 1/word 1/cat 1/time
#  3:       3 10003       300        1 1/word 1/cat 1/time
#  4:       4 10004       400        1 1/word 1/cat 1/time
#  5:       5 10005       500        1 1/word 1/cat 1/time
#  6:       1 10001       100        2 2/word 2/cat 2/time
#  7:       2 10002       200        2 2/word 2/cat 2/time
#  8:       3 10003       300        2 2/word 2/cat 2/time
#  9:       4 10004       400        2 2/word 2/cat 2/time
# 10:       5 10005       500        2 2/word 2/cat 2/time
# 11:       1 10001       100        3 3/word 3/cat 3/time
# 12:       2 10002       200        3 3/word 3/cat 3/time
# 13:       3 10003       300        3 3/word 3/cat 3/time
# 14:       4 10004       400        3 3/word 3/cat 3/time
# 15:       5 10005       500        3 3/word 3/cat 3/time
于 2015-04-07T20:14:29.650 回答
0

尝试这个

library(reshape2)
library(plyr)
df.melt <- melt(df, id.vars=c("Section", "ID0", "Totaltime"), variable.name="item.type", value.name="item.value")
df.mutate <- mutate(df.melt, item.no=gsub("(Item[0-9]+).*", "\\1", item.type), item.type=gsub("Item[0-9]+/", "", item.type)
df.final <- ddply(df.mutate, .(Section, ID0, Totaltime, item.no), function(d) df.final <- ddply(df.mutate, .(Section, ID0, Totaltime, item.no), function(d) dcast(d, Section + ID0 + Totaltime ~ item.type, value.var="item.value", fun.aggregate=function(x) x[1]))
于 2015-04-07T18:34:32.183 回答
0

我认为这得到了你需要的格式:

dd <- data.frame(Section = 1:5, ID0 = 10001:10005, Totaltime = 1:5 * 100,
                 'Item1/Word' = '1/word', 'Item1/Cat' = '1/cat',
                 'Item1/Time' = '1/time',
                 'Item2/Word' = '2/word', 'Item2/Cat' = '2/cat',
                 'Item2/Time' = '2/time',
                 'Item3/Word' = '3/word', 'Item3/Cat' = '3/cat',
                 'Item3/Time' = '3/time', stringsAsFactors = FALSE,
                 check.names = FALSE)


#   Section   ID0 Totaltime Item1/Word Item1/Cat Item1/Time Item2/Word Item2/Cat Item2/Time Item3/Word Item3/Cat Item3/Time
# 1       1 10001       100     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 2       2 10002       200     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 3       3 10003       300     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 4       4 10004       400     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 5       5 10005       500     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time

## define the varying columns:
keys <- paste0('Item', 1:3)
keys <- c('Word','Cat','Time')
l <- lapply(keys, function(x) grep(x, names(dd)))

rr <- reshape(dd, direction = 'long', varying = l)
rr <- rr[with(rr, order(Section, ID0, Totaltime)),
         ## `reshape` makes two extra variabes, time and id, we dont want
         -which(names(rr) %in% c('id','time'))]
rr[, 1:3] <- lapply(rr[, 1:3], function(x) ifelse(duplicated(x), '', x))
`rownames<-`(rr, NULL)

#    Section   ID0 Totaltime Item1/Word Item1/Cat Item1/Time
# 1        1 10001       100     1/word     1/cat     1/time
# 2                              2/word     2/cat     2/time
# 3                              3/word     3/cat     3/time
# 4        2 10002       200     1/word     1/cat     1/time
# 5                              2/word     2/cat     2/time
# 6                              3/word     3/cat     3/time
# 7        3 10003       300     1/word     1/cat     1/time
# 8                              2/word     2/cat     2/time
# 9                              3/word     3/cat     3/time
# 10       4 10004       400     1/word     1/cat     1/time
# 11                             2/word     2/cat     2/time
# 12                             3/word     3/cat     3/time
# 13       5 10005       500     1/word     1/cat     1/time
# 14                             2/word     2/cat     2/time
# 15                             3/word     3/cat     3/time
于 2015-04-07T18:42:12.963 回答