python - AllenNLP Reading Comprehension results are different in UI Demo and Python Library

Question

I am trying AllenNLP reading comprehension with the Transformer QA Model to get the answer to question "Who is CEO of ABB?" from the passage "ABB opened its first dedicated global healthcare research center for robotics in October 2019.".

As expected, the UI demo shows no answer returned. The API response in network tab also shows that. In the json response, best_span_str is empty, but best_span_scores is 9.9. Screenshot of demo UI and API response in network tab.

When I execute the similar code via python library, I get a different result.

from allennlp.predictors.predictor import Predictor
import pandas

def allen_nlp_demo_1():
  import allennlp_models.structured_prediction
  import allennlp_models.rc
  predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/transformer-qa-2020-05-26.tar.gz")
  data = predictor.predict(
    passage="ABB opened its first dedicated global healthcare research center for robotics in October 2019.",
    question= "Who is CEO of ABB?"
  )
  print(data)

if __name__ == '__main__':
  allen_nlp_demo_1()

provides following json output

{
  "span_start_logits": [...],
  "best_span": [
    7,
    15
  ],
  "best_span_scores": -10.418445587158203,
  "loss": 0,
  "best_span_str": "healthcare research center for robotics in October 2019",
  "context_tokens": [...],
  "id": "1",
  "answers": []
}

Here I see best_span_str coming up, and best_span_scores as -10.418445587158203. Attaching python code and output snippet.

Why is this difference in output in the UI demo vs library? Also, what is the range of best_span_scores and where can I decide a threshold to discard false results?

score 2 · Accepted Answer

关于演示输出和您运行的差异，这是因为实际演示使用不同的存档文件。演示中的使用代码现已更新，以反映新的文件路径 ( transformer-qa-2020-10-03.tar.gz)。
为了找到 best_span，模型将 cls 标记的预测视为意味着该问题不可回答。这由表示best_spans，当问题不可回答时，它是 [-1, -1]。对于问题实际上可以回答的情况，跨度分数是相互关联的；我们选择得分最高的跨度。因此，没有可以在所有情况下使用的固定阈值。

python - AllenNLP Reading Comprehension results are different in UI Demo and Python Library

1 回答 1

Related

Reference