我正在尝试使用 python 模块 blaze。当我在小型数据集上使用它时,它可以工作。当我转向更大、更复杂的数据集时,我遇到了错误。我在下面举了一个例子。鉴于该错误,blaze 似乎无法将第一列转换为日期。如何将特定列的 dtype 指定为字符串,以便 Blaze 不会尝试解析。谢谢。
In [2]:
from pandas import *
from pylab import *
import pandas as pd
import pylab as plt
import numpy as np
import csv
import statsmodels.api as sm
import matplotlib
%matplotlib inline
import timeit
import blaze as bz
from blaze import *
bz.__version__
Out[2]:
'0.6.5'
In [3]:
t = Table('C:/Users/CRSP 1991 Current.csv')
In [4]:
t.columns
Out[4]:
[u'PERMNO',
u'date',
u'SICCD',
u'PERMCO',
u'PRC',
u'RET',
u'SHROUT',
u'vwretd',
u'ewretd']
In [5]:
t
C:\Users\Anaconda\lib\site-packages\IPython\core\formatters.py:239: FormatterWarning: Exception in text/html formatter: Unable to parse "12/31/1991" as a date
FormatterWarning,
Out[5]:
<repr(<blaze.api.table.Table at 0x186bd3c8>) failed: ValueError: Unable to parse "12/31/1991" as a date>
In [6]:
t_smaller = t.PERMNO
t_smaller
Out[6]:
PERMNO
0 10001
1 10001
2 10001
3 10001
4 10001
5 10001
6 10001
7 10001
8 10001
9 10001
10 10001
In [7]:
t_smaller_10001 = t_smaller[t_smaller == 10001]
t_smaller_10001
Out[7]:
<repr(<blaze.expr.table.Column at 0x18819048>) failed: ValueError: Unable to parse "12/31/1991" as a date>