0

我正在尝试使用在交易中购买的物品的现有数据框在 R 中创建一个新的数据框,如下所示:

数据的 dput 输出:

structure(list(Transaction = c(1L, 2L, 2L, 3L, 3L, 3L), Item = c("Bread", 
"Scandinavian", "Scandinavian", "Hot chocolate", "Jam", "Cookies"
), date_time = c("30/10/2016 09:58", "30/10/2016 10:05", "30/10/2016 10:05", 
"30/10/2016 10:07", "30/10/2016 10:07", "30/10/2016 10:07"), 
    period_day = c("morning", "morning", "morning", "morning", 
    "morning", "morning"), weekday_weekend = c("weekend", "weekend", 
    "weekend", "weekend", "weekend", "weekend"), Year = c("2016", 
    "2016", "2016", "2016", "2016", "2016"), Month = c("October", 
    "October", "October", "October", "October", "October")), row.names = c(NA, 
6L), class = "data.frame")

正如您在示例中看到的那样,这些行是由于购买的每个单独的产品,而不是交易本身(因此交易 2 是第 2 行和第 3 行)。

我想创建一个新表,其中行是不同的事务(1、2、3 等),不同的列是分类的(面包 = 0、1),所以我可以执行先验分析。

知道如何将不同的交易组合在一起,然后创建这些新列吗?

4

2 回答 2

0

试试 fastDummies 包中的 dummy_cols。这会将项目列变为 0 和 1。第二行是每笔交易的总和。

d <- dummy_cols(data[1:2], remove_selected_column=T)
d <- aggregate(d[-1], by=list(Transaction=d$Transaction), FUN=sum)
于 2021-01-05T14:35:31.353 回答
0

假设您的数据框被调用df,您可以使用tidyr's pivot_wider

df1 <- tidyr::pivot_wider(df, names_from = Item, values_from = Item, 
                          values_fn = n_distinct, values_fill = 0)

df1

#  Transaction date_time      period_day weekday_weekend Year  Month  Bread Scandinavian `Hot chocolate`   Jam Cookies
#        <int> <chr>          <chr>      <chr>           <chr> <chr>  <int>        <int>           <int> <int>   <int>
#1           1 30/10/2016 09… morning    weekend         2016  Octob…     1            0               0     0       0
#2           2 30/10/2016 10… morning    weekend         2016  Octob…     0            1               0     0       0
#3           3 30/10/2016 10… morning    weekend         2016  Octob…     0            0               1     1       1

或使用 data.table 的dcast

library(data.table)
dcast(setDT(df), Transaction+date_time+period_day + weekday_weekend + 
      Year + Month ~ Item, value.var = 'Item', fun.aggregate = uniqueN)
于 2021-01-05T15:04:35.107 回答