3

我有一个这样的数据框:

  message.id sender recipient
1          1      A         B
2          1      A         C
3          2      A         B
4          3      B         C
5          3      B         D
6          3      B         Q

我想通过发送者和接收者列中的值的计数来总结它以获得这个:

  address messages.sent messages.received
1       A             3                 0
2       B             3                 2
3       C             0                 2
4       D             0                 1
5       Q             0                 1

我有工作代码,但它很乱,我希望有一种方法可以在一个magrittr链中完成这一切,而不是我下面的:

df <- data.frame(message.id = c(1,1,2,3,3,3),
                 sender = c("A","A","A","B","B","B"),
                 recipient = c("B","C","B","C","D","Q"))
sent <- df %>% 
  group_by(sender) %>%
  summarise(messages.sent = n()) %>%
  mutate(address = sender) %>%
  select(address, messages.sent)

received <- df %>% 
  group_by(recipient) %>%
  summarise(messages.received = n()) %>%
  mutate(address = recipient) %>%
  select(address, messages.received)

df_summary <- merge(sent, received, all = TRUE) %>%
  replace(is.na(.), 0)
4

4 回答 4

6

我们可以用melt/dcast

library(reshape2)
dcast(melt(df1, id.var='message.id'), value~variable, 
                 value.var='message.id', length)

或使用包装器recast

recast(df1, id.var='message.id', value~variable, length)
#    value sender recipient
#1     A      3         0
#2     B      3         2
#3     C      0         2
#4     D      0         1
#5     Q      0         1

如果我们需要使用dplyr/tidyr

library(dplyr)
library(tidyr)
gather(df1, messages, address, 2:3) %>%
          group_by(messages, address) %>%
          summarise(n=n()) %>% 
          spread(messages, n, fill=0)
#     address sender recipient
#     (chr)  (dbl)     (dbl)
#1       A      3         0
#2       B      3         2
#3       C      0         2
#4       D      0         1
#5       Q      0         1
于 2016-01-01T03:54:54.430 回答
3

如果您正在进行某种网络分析,使用该igraph软件包可能会很有用

library(igraph)

g <- graph_from_data_frame(dat[c(2:3)])

data.frame(address = V(g)$name,
           sent    = degree(g, mode="out"),
           rec     = degree(g, mode="in"))

#   address sent rec
# A       A    3   0
# B       B    3   2
# C       C    0   2
# D       D    0   1
# Q       Q    0   1

igraph如果你喜欢那种东西,也支持管道

这也是一个基本的 R 努力(我知道这不是你想要的))

lvs <- unique(unlist(dat[2:3])) 
sapply(dat[2:3], function(x) table(factor(x, levels=lvs)))
于 2016-01-01T04:41:08.730 回答
2

使用 dplyr 和 tidyr,您可以执行以下操作:

library(dplyr)
library(tidyr)
df <- data.frame(message.id = c(1,1,2,3,3,3),
                 sender = c("A","A","A","B","B","B"),
                 recipient = c("B","C","B","C","D","Q"), stringsAsFactors = FALSE)
df %>% gather(sender, recipient, -message.id) %>% group_by(recipient) %>% summarise(messages.sent = sum(sender == 'sender'), messages.received = sum(sender == 'recipient'))

Source: local data frame [5 x 3]

  recipient messages.sent messages.received
      (chr)         (int)             (int)
1         A             3                 0
2         B             3                 2
3         C             0                 2
4         D             0                 1
5         Q             0                 1
> 

您可以将第一列名称更改为所需的名称,如下所示:

names(df)[1] <- 'address'
于 2016-01-01T06:51:46.173 回答
0

aggregate使用基础 R 中的and的替代方法merge。最后,我们删除 NA 并使用所需的列名重命名列。

summary <- merge(aggregate(message.id ~ sender, data = df, length), 
                  aggregate(message.id ~ recipient, data = df, length), 
                  by.x = "sender", 
                  by.y = "recipient", 
                  all = TRUE)
summary[is.na(summary)] <- 0
colnames(summary) <- c("address", "sent", "received")
summary

输出:

  address sent received
1       A    3        0
2       B    3        2
3       C    0        2
4       D    0        1
5       Q    0        1
于 2016-01-02T15:27:59.773 回答