pandas - Export Pandas Dataframe to csv - Azure AutoML reads the file wrong

Question

I saved a Pandas Dataframe to a csv file. If import it in Azure AutoML it looks like this:

It looks fine if I open it with Excel:

I export the dataframe with this line:

df.to_csv(r'*static_path*/output/measurements.csv')

Attempted Workarounds:

Open in Excel and resave as csv
Open in Excel and resave as tsv
Switch around Encoding options in AzureML
Created the csv and uploaded it to blob storage with the guide from the Microsoft Docs Create a dataset from pandas dataframe

score 1 · Accepted Answer

请注意，根据您的屏幕截图，您在 AzureML 中使用semicolonas 分隔符导入 csv，而将使用as分隔df.to_csv符输出您的信息。comma

将 AzureML 中的分隔符设置更改为comma导入设置或 Python 代码中的分隔符，以提供正确的分隔符，如下所示。

分析您的文件，还请注意，您的第一列似乎是数据框索引，在导出到 csv 时默认包含在 Pandas 中。

请尝试：

df.to_csv(r'*static_path*/output/measurements.csv', sep=';', index=False)

无论如何，您的数据似乎包含跨文本字段的回车。例如，考虑chroma_stft字段。它在屏幕截图显示值的确切位置包含一个回车[0.33353573：

正如您在图像中看到的，AzureML 屏幕截图中显示的模式与您的文本字段中的不同回车完全匹配。

这很可能是您的问题的原因。可能 AzureML 将这些回车解释为实际的行结尾，并相应地拆分您的数据，而与文本字段值包含在引号之间的事实无关。

您需要正确摆脱这些中间回车，可能在将信息导出到 csv 之前替换它们，将类似这样的内容应用于有问题的不同字段：

df.chroma_stft = df.chroma_stft.str.replace('\r', '')

请同时查看...您的文本字段还包含的字符：正如@Ferris 在他/她的评论中所指出的，这可能与该字段包含一个 numpy 数组并且该数组被截断的事实有关。除了他/她建议的解决方案之外，请考虑使用不同的numpy 打印选项，尤其是threshold和linewidth. 我认为调整它们可能会有所帮助。

pandas - Export Pandas Dataframe to csv - Azure AutoML reads the file wrong

1 回答 1

Related

Reference