0

我想知道在同一个 doc id 中使用了多少次名词和形容词。

我发现包的cooccurrence()功能udpipe可以完美地达到这个目的。这是我的数据框:

x <- structure(list(doc_id = c("doc1", "doc1", "doc1", "doc1", "doc1", 
"doc2", "doc2", "doc2", "doc2", "doc2", "doc2", "doc2", "doc2", 
"doc2", "doc3", "doc3", "doc3", "doc4", "doc4", "doc4"), paragraph_id = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), sentence_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), token_id = c("1", 
"3", "4", "9", "11", "1", "4", "5", "6", "9", "13", "16", "21", 
"22", "1", "2", "5", "1", "2", "6"), lemma = c("rent", "incubation", 
"space", "use", "pandemic", "unable", "suitable", "financial", 
"support", "business", "business", "revenue", "month", "time", 
"partnership", "proposal", "party", "many", "mistake", "operation"
), upos = c("NOUN", "NOUN", "NOUN", "NOUN", "NOUN", "ADJ", "ADJ", 
"ADJ", "NOUN", "NOUN", "NOUN", "NOUN", "NOUN", "NOUN", "NOUN", 
"NOUN", "NOUN", "ADJ", "NOUN", "NOUN")), row.names = c(NA, -20L
), class = c("data.table", "data.frame"))

这是函数调用:

cooc <- cooccurrence(x, 
                     term = "lemma", 
                     group = "doc_id")
cooc

但是,每次调用该函数时,都会出现此错误:

Error in `[.data.table`(data, is_list) : 
i is not found in calling scope and it is not a column name either. When the first argument inside DT[...] is a single symbol (e.g. DT[var]), data.table looks for var in calling scope.

我不明白发生了什么事。你能帮助我吗?我对 udpipe 包完全没有经验。

会话信息:

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] modeltools_0.2-23  tidyselect_1.1.1   xfun_0.29          slam_0.1-49        NLP_0.2-1         
 [6] purrr_0.3.4        haven_2.4.3        lattice_0.20-45    colorspace_2.0-2   vctrs_0.3.8       
[11] generics_0.1.1     htmltools_0.5.2    stats4_4.1.2       viridisLite_0.4.0  utf8_1.2.2        
[16] rlang_0.4.12       pillar_1.6.4       glue_1.6.0         DBI_1.1.2          dbplyr_2.1.1      
[21] lifecycle_1.0.1    munsell_0.5.0      gtable_0.3.0       htmlwidgets_1.5.4  knitr_1.37        
[26] forcats_0.5.1      fastmap_1.1.0      tm_0.7-8           parallel_4.1.2     fansi_0.5.0       
[31] Rcpp_1.0.7         scales_1.1.1       RcppParallel_5.1.4 OpenMx_2.19.8      gridExtra_2.3     
[36] ggplot2_3.3.5      hms_1.1.1          digest_0.6.29      dplyr_1.0.7        grid_4.1.2        
[41] cli_3.1.0          tools_4.1.2        magrittr_2.0.1     tibble_3.1.6       crayon_1.4.2      
[46] pkgconfig_2.0.3    MASS_7.3-54        ellipsis_0.3.2     Matrix_1.4-0       xml2_1.3.3        
[51] assertthat_0.2.1   rstudioapi_0.13    viridis_0.6.2      R6_2.5.1           compiler_4.1.2    
4

0 回答 0