model - Huggingface 模型只工作一次，然后吐出 Tokenizer 错误

Question

我正在关注huggingface 网站上的这个示例，尝试处理 twitter 情绪。我在 PyCharm 上运行 python 3.9。代码在我第一次运行时运行良好，但是如果我尝试再次运行代码而不进行任何更改，我会收到此错误：

OSError: Can't load tokenizer for 'cardiffnlp/twitter-roberta-base-emotion'. Make sure that:

- 'cardiffnlp/twitter-roberta-base-emotion' is a correct model identifier listed on 'https://huggingface.co/models'
  (make sure 'cardiffnlp/twitter-roberta-base-emotion' is not a path to a local directory with something else, in that case)

- or 'cardiffnlp/twitter-roberta-base-emotion' is the correct path to a directory containing relevant tokenizer file,

我确实注意到的一件事是，Pycharm 将创建一个名为“cardiffnlp”的文件夹，其中包含与不同任务相对应的子文件夹，例如我的 PyCharm 项目文件夹中的“twitter-roberta-base-sentiment”，就在我的“venv”文件夹上方。但是，如果我删除第一次成功运行代码时创建的“twitter-roberta-base-sentiment”文件夹，代码将正常工作，并且“twitter-roberta-base-sentiment”文件夹将再次显示。

我的猜测是这部分代码正在下载模型并将其保存到 Pycharm。我只是不明白为什么它只第一次起作用。我是否需要更改模型位置，因为如果它已经存储在本地，则不需要记录器转到 URL 来获取文件？

# download label mapping
labels=[]
mapping_link = f"https://raw.githubusercontent.com/cardiffnlp/tweeteval/main/datasets/{task}/mapping.txt"
with urllib.request.urlopen(mapping_link) as f:
    html = f.read().decode('utf-8').split("\n")
    csvreader = csv.reader(html, delimiter='\t')
labels = [row[1] for row in csvreader if len(row) > 1]

Tnanks 的帮助家伙。

model - Huggingface 模型只工作一次，然后吐出 Tokenizer 错误

0 回答 0

Related

Reference