你有一堆nan
值的原因是你没有同质的列类型。因此,例如,当您尝试对各列进行平均时,它没有意义,因为只有在有意义pandas.read_csv
的情况下才会转换为数字列,例如,在与数字相同的列中没有字符串日期或其他文本。
我还建议df.head()
您在进行简单分析之前先检查一下您的数据。当您想知道为什么您的输出“奇怪”时,它将为您节省大量时间。
也就是说,您可以执行以下操作将事物转换为数值,但这并不一定保证有意义:
In [35]: df = read_csv('GOOG Key Ratios.csv', skiprows=2, index_col=0, names=['Y%d' % i for i in range(11)])
In [36]: df.head() # not homogeneously typed columns
Out[36]:
Y0 Y1 Y2 Y3 Y4 \
NaN 2003-12 2004-12 2005-12 2006-12 2007-12
Revenue USD Mil 1,466 3,189 6,139 10,605 16,594
Gross Margin % 57.3 54.3 58.1 60.2 59.9
Operating Income USD Mil 342 640 2,017 3,550 5,084
Operating Margin % 23.4 20.1 32.9 33.5 30.6
Y5 Y6 Y7 Y8 Y9 Y10
NaN 2008-12 2009-12 2010-12 2011-12 2012-12 TTM
Revenue USD Mil 21,796 23,651 29,321 37,905 50,175 55,797
Gross Margin % 60.4 62.6 64.5 65.2 58.9 56.7
Operating Income USD Mil 6,632 8,312 10,381 11,742 12,760 12,734
Operating Margin % 30.4 35.1 35.4 31.0 25.4 22.8
In [37]: df.convert_objects(convert_numeric=True).head()
Out[37]:
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Revenue USD Mil NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Gross Margin % 57.3 54.3 58.1 60.2 59.9 60.4 62.6 64.5 65.2 58.9 56.7
Operating Income USD Mil 342.0 640.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Operating Margin % 23.4 20.1 32.9 33.5 30.6 30.4 35.1 35.4 31.0 25.4 22.8