0
  • 大家好,我是 R 的 NLP 算法的新手。我想从 pdf 中提取一对(动词名词)?我被困在一个词频话题上。就像“在刑事和民事诉讼及其他法律诉讼中代表客户,起草法律文件,或就法律交易管理或建议客户。可能专注于单一领域,也可能在多个法律领域广泛执业。”
  • 我想提取这些的动词名词。我会怎么做?
4

1 回答 1

0
> library(udpipe)
> docs <- "Represent clients in criminal and civil litigation and other legal proceedings, draw up legal documents, or manage or advise clients on legal transactions. May specialize in a single area or may practice broadly in many areas of law."
> docs <- setNames(docs, "doc1")
> anno <- udpipe(docs, object = "english", udpipe_model_repo = "bnosac/udpipe.models.ud")
> anno <- cbind_dependencies(anno, type = "parent")
> subset(anno, upos_parent %in% c("NOUN", "VERB") & upos %in% c("NOUN", "VERB"), 
+        select = c("doc_id", "paragraph_id", "sentence_id", "token", "token_parent", "dep_rel", "upos", "upos_parent"))
   doc_id paragraph_id sentence_id        token token_parent dep_rel upos upos_parent
2    doc1            1           1      clients    Represent     obj NOUN        VERB
7    doc1            1           1   litigation    Represent     obl NOUN        VERB
11   doc1            1           1  proceedings   litigation    conj NOUN        NOUN
13   doc1            1           1         draw    Represent    conj VERB        VERB
16   doc1            1           1    documents         draw     obj NOUN        VERB
19   doc1            1           1       manage    documents    conj NOUN        NOUN
21   doc1            1           1       advise      clients    conj NOUN        NOUN
22   doc1            1           1      clients    Represent     obj NOUN        VERB
25   doc1            1           1 transactions      clients    nmod NOUN        NOUN
32   doc1            1           2         area   specialize     obl NOUN        VERB
35   doc1            1           2     practice   specialize    conj VERB        VERB
39   doc1            1           2        areas     practice     obl NOUN        VERB
41   doc1            1           2          law        areas    nmod NOUN        NOUN
于 2021-11-19T08:56:49.843 回答