1

我有以下格式的数据集。我试图用 R 中的 reshape2 包来做到这一点,但它给出了不合适的格式(所有页面的二进制变量)。是否有任何可用的方法可以按照以下所需格式重塑数据集。

Input format:
User    Pages
1   index.html
1   search.html
1   help.html
1   contact.html
2   help.html
2   contact.html
3   index.html
3   search.html
3   feedback.html

Output format:
User    page1       page2         page3         page4         page5
1       index.html  search.html   help.html     contact.html  NA
2       help.html   contact.html  NA            NA            NA
3       index.html  search.html   feedback.html NA            NA
4

2 回答 2

9

使用dcastreshape2 包中的函数:

library(reshape2)

txt <- "User    Pages
1   index.html
1   search.html
1   help.html
1   contact.html
2   help.html
2   contact.html
3   index.html
3   search.html
3   feedback.html"

mydf <- read.table(text=txt, header=TRUE)

#creating a new column to count the page number:
mydf$page <- paste("Page", unlist((sapply(table(mydf$User), seq))))  

new.df <- dcast( mydf, User ~ page, value.var="Pages") #here the magic happens. 

> print(new.df)
   User     Page 1       Page 2        Page 3       Page 4
1    1 index.html  search.html     help.html contact.html
2    2  help.html contact.html          <NA>         <NA>
3    3 index.html  search.html feedback.html         <NA>
于 2013-04-26T12:13:43.380 回答
2

结合@zelite 惊人的 unlist 技巧

x <- read.table( text = "User    Pages
1   index.html
1   search.html
1   help.html
1   contact.html
2   help.html
2   contact.html
3   index.html
3   search.html
3   feedback.html", h=T)

library(reshape2)

x$tv <- unlist((sapply(table(x$User), seq)))

reshape( x , idvar = 'User' , timevar = 'tv' , direction = 'wide' )
于 2013-04-26T12:11:59.033 回答