我正在尝试从 SQL 连接加载 Dask 数据帧。根据read_sql_table 文档,有必要传入 index_col。如果可能没有好的列作为索引,我该怎么办?
这可能是一个合适的替代品吗?
# Break SQL Query into chunks
chunks = []
num_chunks = math.ceil(num_records / chunk_size)
# Run query for each chunk on Dask workers
for i in range(num_chunks):
query = 'SELECT * FROM ' + table + ' LIMIT ' + str(i * chunk_size) + ',' + str(chunk_size)
chunk = dask.delayed(pd.read_sql)(query, sql_uri)
chunks.append(chunk)
# Aggregate chunks
df = dd.from_delayed(chunks)
dfs[table] = df