5

我有以下代码试图绘制时间序列。请注意,我删除了第二列,因为它不相关。我删除了第一行和最后一行。

import pandas as pd

activity = pd.read_csv('activity.csv', index_col=2)
activity = activity.ix[1:-1] #drop first and last rows due to invalid data
series = activity['activity']
series.plot()

我收到以下错误:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-36df40c11065> in <module>()
----> 1 series.plot()

.../pandas/tools/plotting.pyc in plot_series(series, label, kind, use_index,
                                             rot, xticks, yticks, xlim, ylim,
                                             ax, style, grid, logy,
                                             secondary_y, **kwds)
   1326                      secondary_y=secondary_y, **kwds)
   1327 
-> 1328     plot_obj.generate()
   1329     plot_obj.draw()
   1330 

.../pandas/tools/plotting.pyc in generate(self)
    573         self._compute_plot_data()
    574         self._setup_subplots()
--> 575         self._make_plot()
    576         self._post_plot_logic()
    577         self._adorn_subplots()

.../pandas/tools/plotting.pyc in _make_plot(self)
    916                     args = (ax, x, y, style)
    917 
--> 918                 newline = plotf(*args, **kwds)[0]
    919                 lines.append(newline)
    920                 leg_label = label

.../matplotlib/axes.pyc in plot(self, *args, **kwargs)
   3991         lines = []
   3992 
-> 3993         for line in self._get_lines(*args, **kwargs):
   3994             self.add_line(line)
   3995             lines.append(line)

.../matplotlib>/axes.pyc in _grab_next_args(self, *args, **kwargs)
    328                 return
    329             if len(remaining) <= 3:
--> 330                 for seg in self._plot_args(remaining, kwargs):
    331                     yield seg
    332                 return

.../matplotlib/axes.pyc in _plot_args(self, tup, kwargs)
    287         ret = []
    288         if len(tup) > 1 and is_string_like(tup[-1]):
--> 289             linestyle, marker, color = _process_plot_format(tup[-1])
    290             tup = tup[:-1]
    291         elif len(tup) == 3:

.../matplotlib/axes.pyc in _process_plot_format(fmt)
     94     # handle the multi char special cases and strip them from the
     95     # string
---> 96     if fmt.find('--')>=0:
     97         linestyle = '--'
     98         fmt = fmt.replace('--', '')

AttributeError: 'numpy.ndarray' object has no attribute 'find'

如果我尝试使用一个小数据集,例如:

target, weekday, timestamp
0, Sat, 08 Dec 2012 16:26:26:625000
0, Sat, 08 Dec 2012 16:26:27:625000
0, Sat, 08 Dec 2012 16:26:28:625000
0, Sat, 08 Dec 2012 16:26:29:625000
1, Sat, 08 Dec 2012 16:26:30:625000
2, Sat, 08 Dec 2012 16:26:31:625000
0, Sat, 08 Dec 2012 16:26:32:625000
0, Sat, 08 Dec 2012 16:26:33:625000
1, Sat, 08 Dec 2012 16:26:34:625000
2, Sat, 08 Dec 2012 16:26:35:625000

它有效,但不适用于我的完整数据集。https://dl.dropbox.com/u/60861504/activity.csv 我也用我数据集中的前 10 个项目进行了尝试,得到了同样的错误,但如果我手动分配一个值series[10] = 5,情节就会出现。我难住了。

4

3 回答 3

5

答案在错误消息中:

AttributeError: 'numpy.ndarray' object has no attribute 'find'

您的系列的推断数据类型是字符串(尝试type(series[0])

如果您首先转换数据类型:

series = series.astype(int)
series.plot()

应该管用。

于 2013-03-22T22:09:21.433 回答
4

以我的经验,这是由于数据框中的非数字列而发生的。

pd.read_csv 尝试推断列的数据类型 - 我怀疑您损坏的列可能会混淆此过程,并且您最终会在数据框中使用非数字类型的列

于 2013-03-22T22:01:41.103 回答
0

有两个问题:

  1. Pandas 无法解析日期时间字符串,因为最后一个冒号:2012 年 12 月 8 日 16:26:26 : 625000

  2. 文件中的第二行不是整数,这会导致列的dtype变成str对象。

以下代码适用于您的数据:

import pandas as pd
import re
from StringIO import StringIO
with open('activity.csv') as f:
    str_data = re.sub(r":(\d+)$", r".\1", f.read(), flags=re.MULTILINE)
    data = StringIO(str_data)

activity = pd.read_csv(data, index_col=2, parse_dates=True, dayfirst=True, na_values=["HEND0"])
activity = activity.ix[1:-1]
series = activity['activity']
series.plot()
于 2013-03-22T23:04:05.340 回答