起初我认为这是一个简单的问题,没有尾随换行符,并且新行被附加在同一行,就在最后一行之后,但我可以看到它们之间看起来像一行的空列。
这整个附加的东西看起来很棘手。如果您不必使用 Python,而可以使用命令行工具,我推荐GoCSV。
这是基于我模拟的屏幕截图的示例文件:
基础.csv
Date,Asset,Fear,Anger,Anticipation,Trust,Surprise,Sadness,Disgust,Joy,Positivity,Negativity
Nov 1,5088,0.84,0.58,0.73,1.0,0.26,0.89,0.22,0.5,0.69,0.59
Nov 2,4580,0.0,0.88,0.7,0.71,0.57,0.78,0.2,0.22,0.21,0.17
Nov 3,2469,0.72,0.4,0.66,0.53,0.65,0.64,0.67,0.78,0.54,0.32,,,,,,,
我称它为 base 是因为它是会增长的文件,你可以在最后一行看到它有一个问题:所有这些额外的逗号(我不知道他们是如何到达那里的 ♂️)。
第一步是清理它,并修剪那些讨厌的额外逗号:
% gocsv clean base.csv > tmp
% mv tmp > base.csv
现在base.csv看起来像:
Date,Asset,Fear,Anger,Anticipation,Trust,Surprise,Sadness,Disgust,Joy,Positivity,Negativity
Nov 1,5088,0.84,0.58,0.73,1.0,0.26,0.89,0.22,0.5,0.69,0.59
Nov 2,4580,0.0,0.88,0.7,0.71,0.57,0.78,0.2,0.22,0.21,0.17
Nov 3,2469,0.72,0.4,0.66,0.53,0.65,0.64,0.67,0.78,0.54,0.32
这是要附加的另一组数据sample2.csv:
Date,Asset,Fear,Anger,Anticipation,Trust,Surprise,Sadness,Disgust,Joy,Positivity,Negativity
Nov 4,6040,0.69,0.89,0.72,0.44,0.21,0.15,0.03,0.63,0.78,0.42
Nov 5,7726,0.72,0.12,0.95,0.6,0.88,0.1,0.43,1.0,1.0,0.68
Nov 6,9028,0.87,0.34,0.46,0.57,0.15,0.3,0.8,0.32,0.17,0.42
Nov 7,3544,0.16,0.9,0.37,0.8,0.67,0.0,0.11,0.72,0.93,0.35
GoCSV 的stack命令将完成这项工作:
% gocsv stack base.csv sample2.csv > tmp
% mv tmp base.csv
现在base.csv看起来像:
Date,Asset,Fear,Anger,Anticipation,Trust,Surprise,Sadness,Disgust,Joy,Positivity,Negativity
Nov 1,5088,0.84,0.58,0.73,1.0,0.26,0.89,0.22,0.5,0.69,0.59
Nov 2,4580,0.0,0.88,0.7,0.71,0.57,0.78,0.2,0.22,0.21,0.17
Nov 3,2469,0.72,0.4,0.66,0.53,0.65,0.64,0.67,0.78,0.54,0.32
Nov 4,6040,0.69,0.89,0.72,0.44,0.21,0.15,0.03,0.63,0.78,0.42
Nov 5,7726,0.72,0.12,0.95,0.6,0.88,0.1,0.43,1.0,1.0,0.68
Nov 6,9028,0.87,0.34,0.46,0.57,0.15,0.3,0.8,0.32,0.17,0.42
Nov 7,3544,0.16,0.9,0.37,0.8,0.67,0.0,0.11,0.72,0.93,0.35
这可以像这样编写和简化:
% gocsv clean base.csv > base
% gocsv clean sample.csv > sample
% gocsv stack base sample > base.csv
% rm base sample