我想知道在同一个 doc id 中使用了多少次名词和形容词。
我发现包的cooccurrence()
功能udpipe
可以完美地达到这个目的。这是我的数据框:
x <- structure(list(doc_id = c("doc1", "doc1", "doc1", "doc1", "doc1",
"doc2", "doc2", "doc2", "doc2", "doc2", "doc2", "doc2", "doc2",
"doc2", "doc3", "doc3", "doc3", "doc4", "doc4", "doc4"), paragraph_id = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), sentence_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), token_id = c("1",
"3", "4", "9", "11", "1", "4", "5", "6", "9", "13", "16", "21",
"22", "1", "2", "5", "1", "2", "6"), lemma = c("rent", "incubation",
"space", "use", "pandemic", "unable", "suitable", "financial",
"support", "business", "business", "revenue", "month", "time",
"partnership", "proposal", "party", "many", "mistake", "operation"
), upos = c("NOUN", "NOUN", "NOUN", "NOUN", "NOUN", "ADJ", "ADJ",
"ADJ", "NOUN", "NOUN", "NOUN", "NOUN", "NOUN", "NOUN", "NOUN",
"NOUN", "NOUN", "ADJ", "NOUN", "NOUN")), row.names = c(NA, -20L
), class = c("data.table", "data.frame"))
这是函数调用:
cooc <- cooccurrence(x,
term = "lemma",
group = "doc_id")
cooc
但是,每次调用该函数时,都会出现此错误:
Error in `[.data.table`(data, is_list) :
i is not found in calling scope and it is not a column name either. When the first argument inside DT[...] is a single symbol (e.g. DT[var]), data.table looks for var in calling scope.
我不明白发生了什么事。你能帮助我吗?我对 udpipe 包完全没有经验。
会话信息:
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] modeltools_0.2-23 tidyselect_1.1.1 xfun_0.29 slam_0.1-49 NLP_0.2-1
[6] purrr_0.3.4 haven_2.4.3 lattice_0.20-45 colorspace_2.0-2 vctrs_0.3.8
[11] generics_0.1.1 htmltools_0.5.2 stats4_4.1.2 viridisLite_0.4.0 utf8_1.2.2
[16] rlang_0.4.12 pillar_1.6.4 glue_1.6.0 DBI_1.1.2 dbplyr_2.1.1
[21] lifecycle_1.0.1 munsell_0.5.0 gtable_0.3.0 htmlwidgets_1.5.4 knitr_1.37
[26] forcats_0.5.1 fastmap_1.1.0 tm_0.7-8 parallel_4.1.2 fansi_0.5.0
[31] Rcpp_1.0.7 scales_1.1.1 RcppParallel_5.1.4 OpenMx_2.19.8 gridExtra_2.3
[36] ggplot2_3.3.5 hms_1.1.1 digest_0.6.29 dplyr_1.0.7 grid_4.1.2
[41] cli_3.1.0 tools_4.1.2 magrittr_2.0.1 tibble_3.1.6 crayon_1.4.2
[46] pkgconfig_2.0.3 MASS_7.3-54 ellipsis_0.3.2 Matrix_1.4-0 xml2_1.3.3
[51] assertthat_0.2.1 rstudioapi_0.13 viridis_0.6.2 R6_2.5.1 compiler_4.1.2