0

我刚刚导出了一个数据集

df[['src', 'trg']].to_csv('dataset.csv', index=False, quoting=csv.QUOTE_ALL)

我检查了表格的单元格中根本没有逗号。但是,当我尝试使用

from torchtext.data import TabularDataset
dataset = TabularDataset(os.path.abspath('dataset.csv'), format='csv', fields=['src', 'trg'])

我得到:

ValueError:要解包的值太多(预期为 2)

数据如下所示:

$ head dataset.csv
"src","trg"
"S( CC) /C(=N\ [H] ) N","[H] /N=C(/ N) S CC"
"[CH2:0] 1 [CH2:0] [N:0] ( [CH2:0] [CH:0] 2 [CH2:0] [N:0] ( [C:0] ( [O:0] [CH3:0] ) = [O:0] ) [CH2:0] [CH2:0] [N:0] 2 [C:0] ( [CH2:0] [c:0] 2 [cH:0] [c:0] ( [Cl:0] ) [c:0] ( [Cl:0] ) [cH:0] [cH:0] 2) = [O:0] ) [CH2:0] [CH2:0] 1","[CH3:0] [O:0] [C:0] ( = [O:0] ) [N:0] 1 [CH2:0] [CH2:0] [N:0] ( [C:0] ( = [O:0] ) [CH2:0] [c:0] 2 [cH:0] [cH:0] [c:0] ( [Cl:0] ) [c:0] ( [Cl:0] ) [cH:0] 2) [CH:0] ( [CH2:0] [N:0] 2 [CH2:0] [CH2:0] [CH2:0] [CH2:0] 2) [CH2:0] 1"

值得注意的是,单元格包含\'s.

4

1 回答 1

0
from torchtext.data import TabularDataset, Field

dataset = TabularDataset('dataset.csv', format='csv', fields=[('src', Field()), ('trg', Field())])

工作。fields必须提供元组列表。

于 2021-02-20T02:07:10.130 回答