我正在使用 Huggingface Transformers 的 run_squad.py https://github.com/huggingface/transformers/blob/master/examples/run_squad.py对 BioASQ 问答数据集进行微调。
我已将 BioBERT https://github.com/dmis-lab/bioasq-biobert的作者提供的张量流权重转换为 Pytorch ,如此处讨论的https://github.com/huggingface/transformers/issues/312。
此外,我正在使用 BioASQ https://github.com/dmis-lab/bioasq-biobert的预处理数据,该数据已转换为 SQuAD 形式。但是,当我使用以下参数运行 run_squad.py 脚本时
--model_type bert \
--model_name_or_path /scratch/oe7/uk1594/BioBERT/BioBERT-PyTorch/BioBERTv1.1-SQuADv1.1-Factoid-PyTorch/ \
--do_train \
--do_eval \
--save_steps 1000 \
--train_file $data/BioASQ-train-factoid-6b.json \
--predict_file $data/BioASQ-test-factoid-6b-1.json \
--per_gpu_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /scratch/oe7/uk1594/BioBERT/BioBERT-PyTorch/QA_output_squad/BioASQ-factoid-6b/BioASQ-factoid-6b-1-issue-23mar/
I get the below error:
03/23/2020 12:53:12 - INFO - transformers.modeling_utils - loading weights file /scratch/oe7/uk1594/BioBERT/BioBERT-PyTorch/QA_output_squad/BioASQ-factoid-6b/BioASQ-factoid-6b-1-issue-23mar/pytorch_model.bin
03/23/2020 12:53:15 - INFO - __main__ - Creating features from dataset file at .
0%| | 0/1 [00:00<?, ?it/s]
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "run_squad.py", line 856, in <module>
main()
File "run_squad.py", line 845, in main
result = evaluate(args, model, tokenizer, prefix=global_step)
File "run_squad.py", line 299, in evaluate
dataset, examples, features = load_and_cache_examples(args, tokenizer, evaluate=True, output_examples=True)
File "run_squad.py", line 475, in load_and_cache_examples
examples = processor.get_dev_examples(args.data_dir, filename=args.predict_file)
File "/scratch/oe7/uk1594/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 522, in get_dev_examples
return self._create_examples(input_data, "dev")
File "/scratch/oe7/uk1594/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 549, in _create_examples
answers = qa["answers"]
KeyError: 'answers'
非常感谢您的帮助。
非常感谢您的指导。
评估数据集如下所示:
{
"version": "BioASQ6b",
"data": [
{
"title": "BioASQ6b",
"paragraphs": [
{
"context": "emMAW: computing minimal absent words in external memory. Motivation: The biological significance of minimal absent words has been investigated in genomes of organisms from all domains of life. For instance, three minimal absent words of the human genome were found in Ebola virus genomes",
"qas": [
{
"question": "Which algorithm is available for computing minimal absent words using external memory?",
"id": "5a6a3335b750ff4455000025_000"
}
]
}
]
}
]
}