0

我想使用 stanza 进行标记、pos 标记和解析我拥有的一些文本,但它一直给我这个错误。我试过改变我称之为的方式,但没有任何反应。有任何想法吗?

我的代码(这里遍历文本列表和每个应用节)

t = time()

data_stanza = []
for text in data:
    stz = apply_stanza(text[0])
    data_stanza.append(stz)

print('Time to run: {} mins'.format(round((time() - t) / 60, 2)))

这是我apply_stanza对每个文本使用的功能:

nlp = stanza.Pipeline('pt')

def apply_stanza(text):
    doc = nlp(text)
    All = []
    for sent in doc.sentences:
        for word in sent.words:
            All.append((word.id,word.text,word.lemma,word.upos,word.feats,word.head,word.deprel))
    return All

错误:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-17-7ac303eec8e8> in <module>
      3 data_staza = []
      4 for text in data:
----> 5     stz = apply_stanza(text[0])
      6     data_stanza.append(stz)
      7 

<ipython-input-16-364c3ac30f32> in apply_stanza(text)
      2 
      3 def apply_stanza(text):
----> 4     doc = nlp(text)
      5     All = []
      6     for sent in doc.sentences:

~\anaconda3\lib\site-packages\stanza\pipeline\core.py in __call__(self, doc)
    174         assert any([isinstance(doc, str), isinstance(doc, list),
    175                     isinstance(doc, Document)]), 'input should be either str, list or Document'
--> 176         doc = self.process(doc)
    177         return doc
    178 

~\anaconda3\lib\site-packages\stanza\pipeline\core.py in process(self, doc)
    168         for processor_name in PIPELINE_NAMES:
    169             if self.processors.get(processor_name):
--> 170                 doc = self.processors[processor_name].process(doc)
    171         return doc
    172 

~\anaconda3\lib\site-packages\stanza\pipeline\mwt_processor.py in process(self, document)
     31                 preds = []
     32                 for i, b in enumerate(batch):
---> 33                     preds += self.trainer.predict(b)
     34 
     35                 if self.config.get('ensemble_dict', False):

~\anaconda3\lib\site-packages\stanza\models\mwt\trainer.py in predict(self, batch, unsort)
     77         self.model.eval()
     78         batch_size = src.size(0)
---> 79         preds, _ = self.model.predict(src, src_mask, self.args['beam_size'])
     80         pred_seqs = [self.vocab.unmap(ids) for ids in preds] # unmap to tokens
     81         pred_seqs = utils.prune_decoded_seqs(pred_seqs)

~\anaconda3\lib\site-packages\stanza\models\common\seq2seq_model.py in predict(self, src, src_mask, pos, beam_size)
    259             done = []
    260             for b in range(batch_size):
--> 261                 is_done = beam[b].advance(log_probs.data[b])
    262                 if is_done:
    263                     done += [b]

~\anaconda3\lib\site-packages\stanza\models\common\beam.py in advance(self, wordLk, copy_indices)
     82         # bestScoresId is flattened beam x word array, so calculate which
     83         # word and beam each score came from
---> 84         prevK = bestScoresId / numWords
     85         self.prevKs.append(prevK)
     86         self.nextYs.append(bestScoresId - prevK * numWords)

RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform 
true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.

ATT:它毕竟是它,并且节管道的 mwt 模块出错,所以我只是指定不使用它。

4

2 回答 2

0

用于 //除法而不是/.

尝试按如下方式编辑您的代码:

print('Time to run: {} mins'.format(round((time() - t) // 60, 2)))
于 2020-09-14T06:32:17.510 回答
0

使用下限除法 (//) 会将结果下限为可能的最大整数。

使用torch.true_divide(Dividend, Divisor)numpy.true_divide(Dividend, Divisor)代替。

例如:3/4 = torch.true_divide(3, 4)

https://pytorch.org/docs/stable/generated/torch.true_divide.html https://numpy.org/doc/stable/reference/generated/numpy.true_divide.html

于 2021-05-13T01:39:29.743 回答