此存储库包含所有 AllenNLP 模块的下载路径。你可以下载任何你需要的东西。点击这里!
从以下路径下载 AllenNLP NER Pretrained 模型单击此处!
安装 ALLENNLP 和 allennlp-models
点安装 allennlp
pip install allennlp-models
导入所需的 AllenNlp 模块
导入 allennlp
从 allennlp.predictors.predictor 导入预测器
predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.09.03.tar.gz")
Predict 函数调用 AllenNLP 的 Predictor.predict 函数,该函数需要一段文本来分析命名实体并将其从非结构化文本分类到预定义的类别(单词、标签、掩码和 logits)。比如一个人的名字、位置、地标等。作为一个库(Pythoncode)
BILOU Method/Schema(希望AllenNLP使用BILOU schema)
| ------|--------------------------------------|
| BEGIN | The first token of a final entity |
| ------|--------------------------------------|
| IN | An inner token of a final entity |
| ------|--------------------------------------|
| LAST | The final token of a final entity |
| ------|--------------------------------------|
| Unit | A single-token entity |
| ------|--------------------------------------|
| Out | A non-entity token entity |
| ------|--------------------------------------|
点击这里!
输入
导入所需的包
import allennlp
from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.09.03.tar.gz")
document = """The U.S. is a country of 50 states covering a vast swath of North America, with Alaska in the northwest and Hawaii extending the nation’s presence into the Pacific Ocean. Major Atlantic Coast cities are New York, a global finance and culture center, and capital Washington, DC. Midwestern metropolis Chicago is known for influential architecture and on the west coast, Los Angeles' Hollywood is famed for filmmaking"""
####### Convert Entities ##########
def convert_results(allen_results):
ents = set()
for word, tag in zip(allen_results["words"], allen_results["tags"]):
if tag != "O":
ent_position, ent_type = tag.split("-")
if ent_position == "U":
ents.add((word,ent_type))
else:
if ent_position == "B":
w = word
elif ent_position == "I":
w += " " + word
elif ent_position == "L":
w += " " + word
ents.add((w,ent_type))
return ents
def allennlp_ner(document):
return convert_results(predictor.predict(sentence=document))
results = predictor.predict(sentence=document)
[tuple(i) for i in zip(results["words"],results["tags"])]
##Output##
[('The', 'O'),
('U.S.', 'U-LOC'),
('is', 'O'),
('a', 'O'),
('country', 'O'),
('of', 'O'),
('50', 'O'),
('states', 'O'),
('covering', 'O'),
('a', 'O'),
('vast', 'O'),
('swath', 'O'),
('of', 'O'),
('North', 'B-LOC'),
('America', 'L-LOC'),
(',', 'O'),
('with', 'O'),
('Alaska', 'U-LOC'),
('in', 'O'),
('the', 'O'),
('northwest', 'O'),
('and', 'O'),
('Hawaii', 'U-LOC'),
('extending', 'O'),
('the', 'O'),
('nation', 'O'),
('’s', 'O'),
('presence', 'O'),
('into', 'O'),
('the', 'O'),
('Pacific', 'B-LOC'),
('Ocean', 'L-LOC'),
('.', 'O'),
('Major', 'B-LOC'),
('Atlantic', 'I-LOC'),
('Coast', 'L-LOC'),
('cities', 'O'),
('are', 'O'),
('New', 'B-LOC'),
('York', 'L-LOC'),
(',', 'O'),
('a', 'O'),
('global', 'O'),
('finance', 'O'),
('and', 'O'),
('culture', 'O'),
('center', 'O'),
(',', 'O'),
('and', 'O'),
('capital', 'O'),
('Washington', 'U-LOC'),
(',', 'O'),
('DC', 'U-LOC'),
('.', 'O'),
('Midwestern', 'U-MISC'),
('metropolis', 'O'),
('Chicago', 'U-LOC'),
('is', 'O'),
('known', 'O'),
('for', 'O'),
('influential', 'O'),
('architecture', 'O'),
('and', 'O'),
('on', 'O'),
('the', 'O'),
('west', 'O'),
('coast', 'O'),
(',', 'O'),
('Los', 'B-LOC'),
('Angeles', 'L-LOC'),
("'", 'O'),
('Hollywood', 'U-LOC'),
('is', 'O'),
('famed', 'O'),
('for', 'O'),
('filmmaking', 'O')]
# Merging Multiword NER Tags using convert_results
allennlp_ner(document)
# the output print like this
{('Alaska', 'LOC'),
('Chicago', 'LOC'),
('DC', 'LOC'),
('Hawaii', 'LOC'),
('Hollywood', 'LOC'),
('Los', 'LOC'),
('Los Angeles', 'LOC'),
('Major', 'LOC'),
('Major Atlantic', 'LOC'),
('Major Atlantic Coast', 'LOC'),
('Midwestern', 'MISC'),
('New', 'LOC'),
('New York', 'LOC'),
('North', 'LOC'),
('North America', 'LOC'),
('Pacific', 'LOC'),
('Pacific Ocean', 'LOC'),
('U.S.', 'LOC'),
('Washington', 'LOC')}