- 大家好,我是 R 的 NLP 算法的新手。我想从 pdf 中提取一对(动词名词)?我被困在一个词频话题上。就像“在刑事和民事诉讼及其他法律诉讼中代表客户,起草法律文件,或就法律交易管理或建议客户。可能专注于单一领域,也可能在多个法律领域广泛执业。”
- 我想提取这些的动词名词。我会怎么做?
问问题
28 次
1 回答
0
> library(udpipe)
> docs <- "Represent clients in criminal and civil litigation and other legal proceedings, draw up legal documents, or manage or advise clients on legal transactions. May specialize in a single area or may practice broadly in many areas of law."
> docs <- setNames(docs, "doc1")
> anno <- udpipe(docs, object = "english", udpipe_model_repo = "bnosac/udpipe.models.ud")
> anno <- cbind_dependencies(anno, type = "parent")
> subset(anno, upos_parent %in% c("NOUN", "VERB") & upos %in% c("NOUN", "VERB"),
+ select = c("doc_id", "paragraph_id", "sentence_id", "token", "token_parent", "dep_rel", "upos", "upos_parent"))
doc_id paragraph_id sentence_id token token_parent dep_rel upos upos_parent
2 doc1 1 1 clients Represent obj NOUN VERB
7 doc1 1 1 litigation Represent obl NOUN VERB
11 doc1 1 1 proceedings litigation conj NOUN NOUN
13 doc1 1 1 draw Represent conj VERB VERB
16 doc1 1 1 documents draw obj NOUN VERB
19 doc1 1 1 manage documents conj NOUN NOUN
21 doc1 1 1 advise clients conj NOUN NOUN
22 doc1 1 1 clients Represent obj NOUN VERB
25 doc1 1 1 transactions clients nmod NOUN NOUN
32 doc1 1 2 area specialize obl NOUN VERB
35 doc1 1 2 practice specialize conj VERB VERB
39 doc1 1 2 areas practice obl NOUN VERB
41 doc1 1 2 law areas nmod NOUN NOUN
于 2021-11-19T08:56:49.843 回答