r - 使用 R 进行数据重组

Question

我有一个如下所示的数据集（dat）：

 Person       IPaddress
36598035    222.999.22.99
36598035    222.999.22.99
36598035    222.999.22.99
36598035    222.999.22.99
36598035    222.999.22.99
36598035    444.666.44.66
37811171    111.88.111.88
37811171    111.88.111.88
37811171    111.88.111.88
37811171    111.88.111.88
37811171    111.88.111.88

它反映了个人在一段时间内登录网站的实例。我需要数据看起来像这样：

Person        IPaddress      Number of Logins
36598035    222.999.22.99           6
37811171    111.88.111.88           5

因此，不是同一个人有多个条目，而是每个人只有一行，计算他们登录的次数。

此外，您会注意到在我的示例中，人 36598035 使用多个 IP 地址登录。发生这种情况时，我希望最终数据集中的 IP 地址反映模式 IP 地址——换句话说，个人最常登录的 IP 地址。

score 5 · Accepted Answer

这是一种方法。

library(dplyr)

mydf %>%
    group_by(Person, IPaddress) %>% # For each combination of person and IPaddress
    summarize(freq = n()) %>% # Get total number of log-in
    arrange(Person, desc(freq)) %>% # The most frequent IP address is in the 1st row for each user
    group_by(Person) %>% # For each user
    mutate(total = sum(freq)) %>% # Get total number of log-in
    select(-freq) %>% # Remove count
    do(head(.,1)) # Take the first row for each user

#    Person     IPaddress total
#1 36598035 222.999.22.99     6
#2 37811171 111.88.111.88     5

更新

dplyr0.3现在出来了。因此，您也可以执行以下操作。使用count. 我也slice像@aosmith 推荐的那样使用。

mydf %>%
    count(Person, IPaddress) %>%
    arrange(Person, desc(n)) %>%
    group_by(Person) %>%
    mutate(total = sum(n)) %>%
    select(-n) %>%
    slice(1)

score 4 · Accepted Answer

您可以使用data.table简洁的解决方案：

library(data.table)
setDT(dat)
dat[, list(IPaddress=names(which.max(table(IPaddress))),
           Logins=.N), 
    by=Person]

score 1 · Accepted Answer

尝试：

ddf
     Person     IPaddress
1  36598035 222.999.22.99
2  36598035 222.999.22.99
3  36598035 222.999.22.99
4  36598035 222.999.22.99
5  36598035 222.999.22.99
6  36598035 444.666.44.66
7  37811171 111.88.111.88
8  37811171 111.88.111.88
9  37811171 111.88.111.88
10 37811171 111.88.111.88
11 37811171 111.88.111.88

dd1 = data.table(with(ddf, table(Person, IPaddress)))[rev(order(N))][!duplicated(Person)]
dd1
     Person     IPaddress N
1: 36598035 222.999.22.99 5
2: 37811171 111.88.111.88 5

dd1$all_login_count = data.table(with(ddf, table(Person)))$V1
dd1
     Person     IPaddress N all_login_count
1: 36598035 222.999.22.99 5               6
2: 37811171 111.88.111.88 5               5

r - 使用 R 进行数据重组

3 回答 3

Related

Reference