0

我正在尝试使用 PyExasol 从 Exasol 中并行获取数据。我在这里关注示例 - https://github.com/badoo/pyexasol/blob/master/examples/14_parallel_export.py

我的代码如下所示:

import multiprocessing
import pyexasol
import pyexasol.callback as cb

class ExportProc(multiprocessing.Process):
    def __init__(self, node):
        self.node = node
        self.read_pipe, self.write_pipe = multiprocessing.Pipe(False)

        super().__init__()

    def start(self):
        super().start()
        self.write_pipe.close()

    def get_proxy(self):
        return self.read_pipe.recv()

    def run(self):
        self.read_pipe.close()

        http = pyexasol.http_transport(self.node['host'], self.node['port'], pyexasol.HTTP_EXPORT)
        self.write_pipe.send(http.get_proxy())
        self.write_pipe.close()

        pd1 = http.export_to_callback(cb.export_to_pandas, None)
        print(f"{self.node['idx']}:{len(pd)}")

EXASOL_HOST = "<IP-ADDRESS>:8563"
EXASOL_USERID = "username"
EXASOL_PASSWORD = "password"

c = pyexasol.connect(dsn=EXASOL_HOST, user=EXASOL_USERID, password=EXASOL_PASSWORD, compression=True)

nodes = c.get_nodes(10)

pool = list()
proxy_list = list()

for n in nodes:
  proc = ExportProc(n)
  proc.start()
  proxy_list.append(proc.get_proxy())
  pool.append(proc)

c.export_parallel(proxy_list, "SELECT * FROM SOME_SCHEMA.SOME_TABLE", export_params={'with_column_names': True})

stmt = c.last_statement()

r = stmt.fetchall()

在最后一条语句中,我收到以下错误并且无法获取任何结果。

---------------------------------------------------------------------------
ExaRuntimeError                           Traceback (most recent call last)
<command-911615> in <module>
----> 1 r = stmt.fetchall()

/local_disk0/pythonVirtualEnvDirs/virtualEnv-01515a25-967f-4b98-aa10-6ac03c978ce2/lib/python3.7/site-packages/pyexasol/statement.py in fetchall(self)
     85 
     86     def fetchall(self):
---> 87         return [row for row in self]
     88 
     89     def fetchcol(self):

/local_disk0/pythonVirtualEnvDirs/virtualEnv-01515a25-967f-4b98-aa10-6ac03c978ce2/lib/python3.7/site-packages/pyexasol/statement.py in <listcomp>(.0)
     85 
     86     def fetchall(self):
---> 87         return [row for row in self]
     88 
     89     def fetchcol(self):

/local_disk0/pythonVirtualEnvDirs/virtualEnv-01515a25-967f-4b98-aa10-6ac03c978ce2/lib/python3.7/site-packages/pyexasol/statement.py in __next__(self)
     53         if self.pos_total >= self.num_rows_total:
     54             if self.result_type != 'resultSet':
---> 55                 raise ExaRuntimeError(self.connection, 'Attempt to fetch from statement without result set')
     56 
     57             raise StopIteration

ExaRuntimeError: 
(
    message  =>  Attempt to fetch from statement without result set
    dsn      =>  <IP-ADDRESS>:8563
    user     =>  username
    schema   =>  
)

似乎返回语句的类型不是'resultSet'而是'rowCount'。关于我做错了什么或为什么语句类型是 ''rowCount' 的任何帮助?

4

1 回答 1

0

PyEXASOL 创建者在这里。请不要在并行 HTTP 传输的情况下,您必须在子进程中处理数据块。您的数据集在pd1DataFrame 中可用。

在并行处理的情况下,您不应该.fetchall()在主进程中调用。

我建议检查完整的示例,尤其是示例 14(并行导出)。

希望能帮助到你!

于 2019-12-17T14:11:31.360 回答