编辑:感谢您的评论。我将 doc=nlp(text) 更改为 doc=nlp.make_doc(text)。
我找到了一个我试图复制的代码。它显然是用 Spacy2 编写的:
# add NER to the pipeline and the new label
ner = nlp.get_pipe("ner")
ner.add_label("FOOD")
# get the names of the components we want to disable during training
pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
# start the training loop, only training NER
epochs = 30
optimizer = nlp.resume_training()
with nlp.disable_pipes(*other_pipes), warnings.catch_warnings():
warnings.filterwarnings("once", category=UserWarning, module='spacy')
sizes = compounding(1.0, 4.0, 1.001)
# batch up the examples using spaCy's minibatc
for epoch in range(epochs):
examples = TRAIN_DATA
random.shuffle(examples)
batches = minibatch(examples, size=sizes)
losses = {}
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(texts, annotations, sgd=optimizer, drop=0.35, losses=losses)
print("Losses ({}/{})".format(epoch + 1, epochs), losses)
现在 Spacy3 不再接受文本和注释。因此,我尝试进行如下转换(经过多次绝望的尝试):
import de_core_news_lg
nlp = spacy.load('de_core_news_lg')
ner = nlp.get_pipe("ner")
ner.add_label("LOCALITY")
from spacy.training import Example
from spacy.tokens import Doc
# get the names of the components we want to disable during training
pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
# start the training loop, only training NER
epochs = 30
optimizer = nlp.resume_training()
#optimizer = nlp.initialize()
with nlp.disable_pipes(*other_pipes), warnings.catch_warnings():
warnings.filterwarnings("once", category=UserWarning, module='spacy')
sizes = compounding(1.0, 4.0, 1.001)
# batch up the examples using spaCy's minibatc
for epoch in range(epochs):
random.shuffle(TRAIN_DATA)
#text = []
#annots=[]
examples=[]
for text,annots in TRAIN_DATA:
#text.append(t)
#annots.append(a)
doc = nlp.make_doc(text)
example = Example.from_dict(doc, annots)
examples.append(example)
losses = {}
nlp.update(examples, sgd=optimizer, drop=0.35, losses=losses)
print("Losses ({}/{})".format(epoch + 1, epochs), losses)
现在错误是
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-8337234c4d53> in <module>
36 losses = {}
37
---> 38 nlp.update(examples, sgd=optimizer, drop=0.35, losses=losses)
39
40 print("Losses ({}/{})".format(epoch + 1, epochs), losses)
~/nlp_learn/statbot/.statbot/lib/python3.8/site-packages/spacy/language.py in update(self, examples, _, drop, sgd, losses, component_cfg, exclude)
1104 if name in exclude or not hasattr(proc, "update"):
1105 continue
-> 1106 proc.update(examples, sgd=None, losses=losses, **component_cfg[name])
1107 if sgd not in (None, False):
1108 for name, proc in self.pipeline:
~/nlp_learn/statbot/.statbot/lib/python3.8/site-packages/spacy/pipeline/transition_parser.pyx in spacy.pipeline.transition_parser.Parser.update()
~/nlp_learn/statbot/.statbot/lib/python3.8/site-packages/spacy/pipeline/transition_parser.pyx in spacy.pipeline.transition_parser.Parser.get_batch_loss()
~/nlp_learn/statbot/.statbot/lib/python3.8/site-packages/spacy/pipeline/_parser_internals/ner.pyx in spacy.pipeline._parser_internals.ner.BiluoPushDown.set_costs()
ValueError:
Valueerror 为空。我读过这可以连接到小样本?但是现在示例中有超过 5900 项。我正在尝试使用德国模型(de_core_news_lg)。
创建“示例”,他现在还在循环中给出了这个警告:
/home/z01/nlp_learn/statbot/.statbot/lib/python3.8/site-packages/spacy/training/iob_utils.py:139: UserWarning: [W030] Some entities could not be aligned in the text "Was ist der Anteil an Bevölkerung: Anteil 80 u.m.-..." with entities "[(62, 70, 'GRAN')]". Use `spacy.training.offsets_to_biluo_tags(nlp.make_doc(text), entities)` to check the alignment. Misaligned entities ('-') will be ignored during training.