0

我尝试使用colab 中的torchtext中的.build_vocab()来构建词汇表。它返回错误消息:AttributeError: 'Example' object has no attribute 'Insult'

我的问题类似于@szymix12 的问题。他的回答是确保传递的字段顺序与 csv 标头相同。我确认我分配的订单是正确的。csv 数据有两列:“侮辱”(标签)和“评论”(文本)。“侮辱”是二进制标签指示符(0 或 1)。

代码如下,我也做了一个colab notebook。随意奔跑。

import os
import torch
from torchtext import data, datasets
from torchtext.utils import download_from_url

CSV_FILENAME = 'data.csv'
CSV_GDRIVE_URL = 'https://drive.google.com/open?id=1ctPKO-_sJbmc8RodpBZ5-3EyYBWlUy5Pbtjio5pyq00'
download_from_url(CSV_GDRIVE_URL, CSV_FILENAME)

TEXT = data.Field(tokenize = 'spacy', batch_first = True, lower=True)  #from torchtext import data
LABEL = data.LabelField(sequential=False, dtype = torch.float)

train = data.TabularDataset(path=os.path.join('/content', CSV_FILENAME),
                            format='csv',
                            fields = [('Insult', LABEL), ('Comment', TEXT)],
                            skip_header=True)


print(vars(train[0]),vars(train[1]),vars(train[2]))

TEXT.build_vocab(train)

4

2 回答 2

1

您的脚本获取的是 HTML 文件而不是实际的数据集。这是因为您使用的 URL'https://drive.google.com/open?id=1ctPKO-_sJbmc8RodpBZ5-3EyYBWlUy5Pbtjio5pyq00'不是 csv 文件的直接 URL。它是 HTML 格式,因为提供的 URL 是 Google 表格的。要解决此问题,您可以将数据集下载到您的计算机并将其上传到 Colab。


这是您获取的 data.csv 的内容。

data.csv 的内容

于 2020-04-07T21:02:43.287 回答
0

您不需要按照@knoop 的建议手动下载和上传文件。正如我在之前对您问题的回答中提供的那样,您不应使用带有open. 你应该改为uc?export=download改为。这样,正确的CSV_GDRIVE_URL应该是:

CSV_GDRIVE_URL = 'https://drive.google.com/uc?export=download&id=1eWMjusU3H34m0uml5SdJvYX6gQuB8zta'

然后,如果你cat这样做,你会得到:

,Insult,Date,Comment
11,0,20120530044519Z,"""Be careful,Jimbo.OG has a fork with your name on it."""
1478,0,,"""@dale0987654321 @\\'Micah Carter So you\\'re saying those claims are all false...?  I don\\'t know... just from my understanding these are in his book.... but I can be wrong... I\\'m just saying if they are true....should I just assume he\\'s racist?\\n\\nFrom \\'Dreams of My Father\\',""I CEASED TO ADVERISE MY MOTHER\\'S RACE AT THE AGE OF12 OR 13, when I began to suspect that by doing so I was ingratiating myself to whites""From Dreams of My Father, "" I FOUND A SOLACE IN NURSING A PERVASIVE SENSE OF GRIEVANCE AND ANIMOSITY AGAINST MY MOTHER\\'S RACE"".From \\'Dreams of my Father\\', ""The emotion between the races could never be pure..... the THE OTHER RACE (WHITE) WOULD ALWAYS REMAIN JUST THAT: MENACING, ALIEN AND APART""From Dreams Of My Father, ""never emulate white men and brown men whose fates didn\\'t speak to my own. IT WAS INTO MY FATHER\\'S IMAGE , THE BLACK MAN, THE SON OF AFRICA, THAT I\\'D PACKED ALL THE ATTRIBUTES I SOUGHT IN MYSELF.From Dreams Of My Father:""THAT HATE HADN\\'T GONE AWAY,"" he wrote, BLAMING WHITE PEOPLE,- SOME CRUEL, SOME IGNORANT, sometimes a single face, sometimes just a faceless image of a system claiming power over our lives."""""
1983,0,20120610152024Z,"""Only fucking with number 2... that's prolly the only pair i can pull off \xa0out these."""
3201,1,,"""S T F U and pay your dues"""
3549,0,20120530024115Z,"""Worry about your current opponent...."""
3402,1,20120618222736Z,"""And while you're at it, please prove how your ""point"" related to past draconian laws that banned inter-racial marriage?\xa0 That intent was to ""ban"" mixed races.\xa0 Just stop it please! You are insulting intelligence."""
2233,0,20120620034735Z,"""I'd rather have an old, straight white guy than the stupid fcking zebra we have now....everything he touches turns toshit............"""
3219,1,20120619020631Z,"""Only a black racists idiot would claim that the NAACP is not racist in todays world, especially since you're a member.\n\nAnd I don't give a 'bo' whether you take that personally or not."""
3148,0,,"""Fight for money !!! go ahead ,kill yourselfs, all for greed,common, take decision that is not in accordance with what the supreme court says, find some connection by the INEC ,and say to yourself ,HO IS MY TURN TO LOOT NOT YOURS,OR I KILL U.bounch of shameless idiots."""
386,0,,"""The cleverest comment that I have seen for a long time, I think that it was in the Mail\\n\\n \\n\\n\\'Arshavin is the worst Russian Sub since the Kursk"""""

并且TEXT.build_vocab(train)会成功运行。

于 2020-04-08T12:29:39.267 回答