1

我在 R 中有以下代码,用于从文本中提取人员和位置:

library(rvest)
library(NLP)
library(openNLP)

page = pdf_text("C:/Users/u214738/Documents/NER_Data.pdf")

text = as.String(page)

sent_annot = Maxent_Sent_Token_Annotator()
word_annot = Maxent_Word_Token_Annotator()

install.packages("openNLPmodels", repos = "http://datacube.wu.ac.at/src/contrib/", type = "source")
install.packages("openNLPmodels.en", repos = "http://datacube.wu.ac.at/", type = "source")
install.packages("openNLPmodels.en", repos = "http://datacube.wu.ac.at/", type = "source",kind="person")
install.packages("openNLPmodels.en",repos ="http://datacube.wu.ac.at/", type = "source",kind="location")
install.packages("openNLPmodels.de", repos = "http://datacube.wu.ac.at/", type = "source")

library(openNLPmodels.de)
library(openNLPmodels.en)

loc_annot = Maxent_Entity_Annotator(kind = "location") #annotate location
people_annot = Maxent_Entity_Annotator(kind = "person") #annotate person

annot.l1 = NLP::annotate(text, list(sent_annot,word_annot))

k <- sapply(annot.l1$features,`[[`,"kind")
Locations = text[annot.l1[k=="location"]]
People = text[annot.l1[k == "person"]]

unique(Locations)
print(Locations)

unique(People)
print(People)

但我得到的结果如下:

独特的(地点)

字符(0)

打印(位置)

字符(0)

独特的(人)

字符(0)

打印(人)

字符(0)

NER_Data 包含任何带有人名和位置的文本,例如比尔·盖茨、沃伦·巴菲特的信息

需要您对此模块的快速指导。

4

0 回答 0