0
library(pdfsearch)
Characters <- c("Ben", "John")
keyword_search('location of file', 
               keyword = Characters,
               path = TRUE)


     keyword page_num

1      Ben    1
2      Ben    1
3     John    1
4     John    2

如何让 R 计算每个 page_num 上的所有关键字,创建如下数据框:

      name   page  count
1      Ben    1      2
2     John    1      1
3     John    2      1

我知道 nrow 功能,但有更快的方法吗?

nrow(dataframe[dataframe$keyword == "Ben" & dataframe$page_num == 1, ])
4

1 回答 1

0

Base R 支持多种方式来执行分组操作(可能太多了,因为这使得选择合适的方法变得更加困难):

my_data <- data.frame(name = c("Ben", "Ben", "John", "John"), page_num = c(1,1,1,2))

> test
  name page_num
1  Ben        1
2  Ben        1
3 John        1
4 John        2


# table()

> table(my_data)
      page_num
name   1 2
  Ben  2 0
  John 1 1

> as.data.frame(table(my_data))
  name page_num Freq
1  Ben        1    2
2 John        1    1
3  Ben        2    0
4 John        2    1

# xtabs

> xtabs(~ name + page_num, data = test)

      page_num
name   1 2
  Ben  2 0
  John 1 1

> as.data.frame(xtabs(~ name + page_num, data = my_data))
  name page_num Freq
1  Ben        1    2
2 John        1    1
3  Ben        2    0
4 John        2    1

用于执行分组操作的其他函数包括by()tapply()ave()

流行的包还具有无需转换即可dplyr对对象执行分组操作的语法:data.frame

library(dplyr)

# `group_by()`, `mutate()`, `%>%`, and `n()` are exports from `dplyr`
my_data %>%
  group_by(name, page_number) %>%
  mutate(count = n())
  # n() is a dplyr operator that is mechanically identical to length()
于 2020-11-16T14:51:29.700 回答