tensorflow - KeyError：使用 Huggingface Transformers 使用 BioASQ 数据集时出现“答案”错误

Question

我正在使用 Huggingface Transformers 的 run_squad.py https://github.com/huggingface/transformers/blob/master/examples/run_squad.py对 BioASQ 问答数据集进行微调。

我已将 BioBERT https://github.com/dmis-lab/bioasq-biobert的作者提供的张量流权重转换为 Pytorch ，如此处讨论的https://github.com/huggingface/transformers/issues/312。

此外，我正在使用 BioASQ https://github.com/dmis-lab/bioasq-biobert的预处理数据，该数据已转换为 SQuAD 形式。但是，当我使用以下参数运行 run_squad.py 脚本时

 --model_type bert \
  --model_name_or_path /scratch/oe7/uk1594/BioBERT/BioBERT-PyTorch/BioBERTv1.1-SQuADv1.1-Factoid-PyTorch/ \
  --do_train \
  --do_eval \
  --save_steps 1000 \
  --train_file $data/BioASQ-train-factoid-6b.json \
  --predict_file $data/BioASQ-test-factoid-6b-1.json \
  --per_gpu_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /scratch/oe7/uk1594/BioBERT/BioBERT-PyTorch/QA_output_squad/BioASQ-factoid-6b/BioASQ-factoid-6b-1-issue-23mar/


I get the below error:

03/23/2020 12:53:12 - INFO - transformers.modeling_utils -   loading weights file /scratch/oe7/uk1594/BioBERT/BioBERT-PyTorch/QA_output_squad/BioASQ-factoid-6b/BioASQ-factoid-6b-1-issue-23mar/pytorch_model.bin
03/23/2020 12:53:15 - INFO - __main__ -   Creating features from dataset file at .

  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "run_squad.py", line 856, in <module>
    main()
  File "run_squad.py", line 845, in main
    result = evaluate(args, model, tokenizer, prefix=global_step)
  File "run_squad.py", line 299, in evaluate
    dataset, examples, features = load_and_cache_examples(args, tokenizer, evaluate=True, output_examples=True)
  File "run_squad.py", line 475, in load_and_cache_examples
    examples = processor.get_dev_examples(args.data_dir, filename=args.predict_file)
  File "/scratch/oe7/uk1594/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 522, in get_dev_examples
    return self._create_examples(input_data, "dev")
  File "/scratch/oe7/uk1594/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 549, in _create_examples
    answers = qa["answers"]
KeyError: 'answers'

非常感谢您的帮助。

非常感谢您的指导。

评估数据集如下所示：

{
  "version": "BioASQ6b", 
  "data": [
    {
      "title": "BioASQ6b", 
      "paragraphs": [
        {
          "context": "emMAW: computing minimal absent words in external memory. Motivation: The biological significance of minimal absent words has been investigated in genomes of organisms from all domains of life. For instance, three minimal absent words of the human genome were found in Ebola virus genomes",
          "qas": [
            {
              "question": "Which algorithm is available for computing minimal absent words using external memory?", 
              "id": "5a6a3335b750ff4455000025_000"
            }
          ]
        }
    ]
}
]
}

score 1 · Accepted Answer

BioASQ 评估文件是不包含答案的测试文件，仅用于预测。对于训练期间的评估，您可以使用部分训练文件

tensorflow - KeyError：使用 Huggingface Transformers 使用 BioASQ 数据集时出现“答案”错误

1 回答 1

Related

Reference