我有一个 dask 数据框,它有 220 个分区和 7 列。我已经从 bcp 文件中导入了这个文件,并在 dask 中完成了一些争论。然后我想使用 turboodbc 将整个文件写入 mssql。我按如下方式连接到数据库:
mydb = 'TEST'
from turbodbc import connect, make_options
connection = connect(driver="ODBC Driver 17 for SQL Server",
server="TEST SERVER",
port="1433",
database=mydb,
uid="sa",
pwd="5pITfir3")
然后我使用从一篇中型文章中找到的函数写入数据库中的测试表:
def turbo_write(mydb, df, table): """使用 turbodbc 将数据插入 sql。""" start = time.time() # 准备列 columns = '(' columns += ', '.join(df.列)列+=')'
# preparing value place holders
val_place_holder = ['?' for col in df.columns]
sql_val = '('
sql_val += ', '.join(val_place_holder)
sql_val += ')'
# writing sql query for turbodbc
sql = f"""
INSERT INTO {mydb}.dbo.{table} {columns}
VALUES {sql_val}
"""
print(sql)
print(sql_val)
# writing array of values for turbodbc
values_df = [df[col].values for col in df.columns]
print(values_df)
# cleans the previous head insert
with connection.cursor() as cursor:
cursor.execute(f"delete from {mydb}.dbo.{table}")
connection.commit()
# inserts data, for real
with connection.cursor() as cursor:
#try:
cursor.executemanycolumns(sql, values_df)
connection.commit()
# except Exception:
# connection.rollback()
# print('something went wrong')
stop = time.time() - start
return print(f'finished in {stop} seconds')
这在我上传少量行时有效,如下所示:
turbo_write(mydb, df_train.head(1000), table)
当我尝试执行更多行时,它失败了:
turbo_write(mydb, df_train.head(10000), table)
我得到错误:
RuntimeError:无法将 Python 实例转换为 C++ 类型(在调试模式下编译以了解详细信息)
我如何将整个 dask 数据帧写入 mssql 而不会出现任何错误?