我试图找出一个问题,但到目前为止我找不到任何解决方案,希望您能提供帮助。我有一个 DataFrame,我想转换str
为,datatime
但有一些我想过滤掉的无效行。这里有两个例子:
Out[6]:
# name date
0 aa 2012-11-30T14:00:00+01:00
1 bb 2012-12-01T08:16:00+01:00
2 cc 2012-12-01T10:14:00+01:00
3 ee 2012-12-01T11:05:00+01:00
4 gg 2012-12-01T11:05:00+01:00
In [7]: df2
Out[7]:
# name date
0 aa 2012-11-30T14:00:00+01:00
1 bb 2012-12-01T08:16:00+01:00
2 cc 2012-12-01T10:14:00+01:00
3 ee 2012-12-01T11:05:00+01:00
4 ff fsadfi2 2ih3ro
5 gg 2012-12-01T11:05:00+01:00
In [11]: df.dtypes
Out[11]:
name <class 'str'>
date <class 'str'>
dtype: object
In [12]: df2.dtypes
Out[12]:
name <class 'str'>
date <class 'str'>
dtype: object
df
我很好,它只有date
列中的有效日期。但是df2
有一些无效的行。让我们df
首先看一下我可以转换为的以下行datetime
:
df['pdate']=df.date.values.astype('datetime64[ns]')
效果很好:
In [16]: df
Out[16]:
# name date pdate
0 aa 2012-11-30T14:00:00+01:00 2012-11-30 13:00:00.000000000
1 bb 2012-12-01T08:16:00+01:00 2012-12-01 07:16:00.000000000
2 cc 2012-12-01T10:14:00+01:00 2012-12-01 09:14:00.000000000
3 ee 2012-12-01T11:05:00+01:00 2012-12-01 10:05:00.000000000
4 gg 2012-12-01T11:05:00+01:00 2012-12-01 10:05:00.000000000
In [17]: df.dtypes
Out[17]:
name <class 'str'>
date <class 'str'>
pdate datetime64[ns]
dtype: object
现在我尝试用一个非常简单的str.contains
::
In [18]: df2_filtered=df2[df2['date'].str.contains(':00')]
In [19]: df2_filtered
Out[19]:
# name date
0 aa 2012-11-30T14:00:00+01:00
1 bb 2012-12-01T08:16:00+01:00
2 cc 2012-12-01T10:14:00+01:00
3 ee 2012-12-01T11:05:00+01:00
4 gg 2012-12-01T11:05:00+01:00
In [20]: df2_filtered.dtypes
Out[20]:
name <class 'str'>
date <class 'str'>
dtype: object
它只有5 Rows
. 现在我尝试转换并收到一条很好的错误消息:
In [21]: df2_filtered['pdate']=df2_filtered.date.values.astype('datetime64[ns]')
...:
/usr/local/bin/ipython:1: DeprecationWarning: parsing timezone aware datetimes is deprecated; this will raise an error in the future
#!/opt/local/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-563087d6f949> in <module>
----> 1 df2_filtered['pdate']=df2_filtered.date.values.astype('datetime64[ns]')
/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py in __setitem__(self, name, value)
4370 if isinstance(name, six.string_types):
4371 if isinstance(value, (np.ndarray, Column)):
-> 4372 self.add_column(name, value)
4373 else:
4374 self.add_virtual_column(name, value)
/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py in add_column(self, name, data, dtype)
5743 # self._length_original = len(data)
5744 # self._index_end = self._length_unfiltered
-> 5745 super(DataFrameArrays, self).add_column(name, data, dtype=dtype)
5746 self._length_unfiltered = int(round(self._length_original * self._active_fraction))
5747 # self.set_active_fraction(self._active_fraction)
/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py in add_column(self, name, f_or_array, dtype)
2872 # give a better warning to avoid confusion
2873 if len(self) == len(ar):
-> 2874 raise ValueError("Array is of length %s, while the length of the DataFrame is %s due to the filtering, the (unfiltered) length is %s." % (len(ar), len(self), self.length_unfiltered()))
2875 raise ValueError("array is of length %s, while the length of the DataFrame is %s" % (len(ar), self.length_original()))
2876 # assert self.length_unfiltered() == len(data), "columns should be of equal length, length should be %d, while it is %d" % ( self.length_unfiltered(), len(data))
ValueError: Array is of length 5, while the length of the DataFrame is 5 due to the filtering, the (unfiltered) length is 6.
说:ValueError:数组的长度是5,而DataFrame的长度是5,由于过滤,(未过滤的)长度是6。
但据我了解,df2_filtered
我只有 5 行。我不知道为什么df2
.
基本上我的问题是如何过滤掉不必要的数据并将列转换为日期时间?
更新
基于Maarten Breddels
我尝试使用:
df2_filtered['pdate']=df2_filtered.date.astype('datetime64[ns]')
这似乎有效,但是当我尝试使用时,df2_filtered
我得到以下信息。
In [57]: df2_filtered
Out[57]: ERROR:MainThread:vaex:error evaluating: pdate at rows 0-5
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/scopes.py", line 94, in evaluate
result = self[expression]
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/scopes.py", line 141, in __getitem__
raise KeyError("Unknown variables or column: %r" % (variable,))
KeyError: 'Unknown variables or column: "astype(date, \'datetime64[ns]\')"'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py", line 3467, in table_part
values[name] = df.evaluate(name)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py", line 5038, in evaluate
dtype = dtypes[expression] = self.dtype(expression, internal=False)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py", line 2005, in dtype
data = self.evaluate(expression, 0, 1, filtered=False, internal=True, parallel=False)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py", line 5143, in evaluate
value = scope.evaluate(expression)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/scopes.py", line 94, in evaluate
result = self[expression]
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/scopes.py", line 136, in __getitem__
self.values[variable] = self.evaluate(expression) # , out=self.buffers[variable])
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/scopes.py", line 100, in evaluate
result = eval(expression, expression_namespace, self)
File "<string>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/functions.py", line 2106, in _astype
return x.astype(dtype)
AttributeError: 'ColumnStringArrow' object has no attribute 'astype'
ERROR:MainThread:vaex:error evaluating: pdate at rows 0-5
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/scopes.py", line 94, in evaluate
result = self[expression]
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/scopes.py", line 141, in __getitem__
raise KeyError("Unknown variables or column: %r" % (variable,))
KeyError: 'Unknown variables or column: "astype(date, \'datetime64[ns]\')"'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py", line 3467, in table_part
values[name] = df.evaluate(name)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py", line 5038, in evaluate
dtype = dtypes[expression] = self.dtype(expression, internal=False)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py", line 2005, in dtype
data = self.evaluate(expression, 0, 1, filtered=False, internal=True, parallel=False)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py", line 5143, in evaluate
value = scope.evaluate(expression)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/scopes.py", line 94, in evaluate
result = self[expression]
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/scopes.py", line 136, in __getitem__
self.values[variable] = self.evaluate(expression) # , out=self.buffers[variable])
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/scopes.py", line 100, in evaluate
result = eval(expression, expression_namespace, self)
File "<string>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/functions.py", line 2106, in _astype
return x.astype(dtype)
AttributeError: 'ColumnStringArrow' object has no attribute 'astype'
# name date pdate
0 aa 2012-11-30T14:00:00+01:00 error
1 bb 2012-12-01T08:16:00+01:00 error
2 cc 2012-12-01T10:14:00+01:00 error
3 ee 2012-12-01T11:05:00+01:00 error
4 gg 2012-12-01T11:05:00+01:00 error