pandas - PyODBC+Pandas+Read_SQL：错误：游标的连接已关闭

Question

我正在读取表格为 SELECT * FROM TABLE (sql); 通过 PyODBC 从 ODBC 数据源并使用 Pandas read_sql() 获取/加载所有行。但是，有 200 多个表，有些表有 100,000 行，因此一直在使用 chunksize 读取和加载到数据帧以获得一些读取性能。

下面是一个示例代码：

def get_odbc_tables(dsn,uid,pwd)
  try:
       cnxn = pyodbc.connect('DSN={};UID={};PWD={}'.format(dsn, uid, pwd), autocommit=True)
        # Get data into pandas dataframe
        dfl = []  
        df = pd.DataFrame() 
        for chunk in pd.read_sql(sql, cnxn, chunksize=10000):
            dfl.append(chunk)
            df = pd.concat(dfl, ignore_index=True)
            records = json.loads(df.T.to_json()).values()
            print("Load to Target")
            ......
            cnxn.close()
  except Exception as e:
        print("Error: {}".format(str(e)))
        sys.exit(1)

但是，在 pandas 读取/处理 read_sql 中定义的指定块大小（10,000）并加载到目标后，我总是收到此错误：

错误：游标的连接已关闭

如果 chunksize 增加到 50,000；一旦它只处理/加载了 50,000 条记录，它就会再次出错，并显示与上述相同的错误消息，即使源有比这更多的记录。这也导致程序失败。

C:\Program Files (x86)\Python\lib\site-packages\pandas\io\sql.py in _query_iterator(cursor, chunksize, columns, index_col, coerce_float, parse_dates)
   1419         while True:
-> 1420             data = cursor.fetchmany(chunksize)
   1421             if type(data) == tuple:

ProgrammingError: The cursor's connection has been closed.

During handling of the above exception, another exception occurred:

SystemExit                                Traceback (most recent call last)
<ipython-input-127-b106daee9737> in <module>()

请建议是否有任何方法可以处理这个问题。源只是一个 ODBC 数据源连接，因此我认为无法为 ODBC 数据源创建 SQLAlchemy 引擎。

pandas - PyODBC+Pandas+Read_SQL：错误：游标的连接已关闭

0 回答 0

Related

Reference