我正在尝试DataFrame
通过读取由 '#####' 5 个哈希分隔的 csv 文件来创建
代码是:
import dask.dataframe as dd
df = dd.read_csv('D:\temp.csv',sep='#####',engine='python')
res = df.compute()
错误是:
dask.async.ValueError:
Dask dataframe inspected the first 1,000 rows of your csv file to guess the
data types of your columns. These first 1,000 rows led us to an incorrect
guess.
For example a column may have had integers in the first 1000
rows followed by a float or missing value in the 1,001-st row.
You will need to specify some dtype information explicitly using the
``dtype=`` keyword argument for the right column names and dtypes.
df = dd.read_csv(..., dtype={'my-column': float})
Pandas has given us the following error when trying to parse the file:
"The 'dtype' option is not supported with the 'python' engine"
Traceback
---------
File "/home/ec2-user/anaconda3/lib/python3.4/site-packages/dask/async.py", line 263, in execute_task
result = _execute_task(task, data)
File "/home/ec2-user/anaconda3/lib/python3.4/site-packages/dask/async.py", line 245, in _execute_task
return func(*args2)
File "/home/ec2-user/anaconda3/lib/python3.4/site-packages/dask/dataframe/io.py", line 69, in _read_csv
raise ValueError(msg)
那么如何摆脱它。
如果我遵循错误,那么我将不得不为每一列提供 dtype,但如果我有 100 多列,那么这是没有用的。
如果我在没有分隔符的情况下阅读,那么一切都很好,但到处都有#####。那么在计算到熊猫之后DataFrame
,有没有办法摆脱它?
所以在这方面帮助我。