0

设置:我在连接到具有大 csv (200Gb) 的 HDFS 的 Centos 7.6 VM 上运行 Dremio。现在我通过 pyodbc 查询 Dremio 但是如果我使用该方法:

cnxn = pyodbc.connect("Driver={};ConnectionType=Direct;HOST={};PORT={};AuthenticationType=Plain;UID={};PWD={}".format(driver, host,port,uid,pwd),autocommit=True)
sql = "SELECT * FROM HDFS.data.\"factTable.txt\" WHERE C > 0.0 LIMIT 100"
cursor = cnxn.cursor()
cursor.execute(sql)

它未能给出错误“用户'mpowers'取消查询。使用

cnxn = pyodbc.connect("Driver={};ConnectionType=Direct;HOST={};PORT={};AuthenticationType=Plain;UID={};PWD={}".format(driver, host,port,uid,pwd),autocommit=True)
sql = "SELECT * FROM HDFS.data.\"factTable.txt\" WHERE C > 0.0 LIMIT 100"
dataframe = pandas.read_sql(sql,cnxn)

但是查询成功(但我不想阅读我的查询,我只想计时)。查看我的 server.out 文件比较两个查询:

2020-12-07 19:34:26,749 [UserServer-1] INFO  com.dremio.ConnectionLog - [b61585f8-5b73-4806-97e8-28e77caa4a74] Connection opened.
    Endpoint: 127.0.0.1:42862
    Protocol Version: 5
    Record Type: n/a
    Record Formats: 
    Support Complex Types: false
    Name: Dremio C++ Client
    Version: 1.17.0 (1.17.0)
    Application: Unknown
    User Properties: 
        schema=
        support_complex_types=false
        userName=mpowers

2020-12-07 19:34:26,750 [UserServer-1] WARN  c.dremio.sabot.rpc.user.UserSession - Ignoring unknown property: SUPPORT_COMPLEX_TYPES
2020-12-07 19:34:26,750 [UserServer-1] WARN  c.dremio.sabot.rpc.user.UserSession - Ignoring unknown property: USERNAME
2020-12-07 19:34:26,754 [out-of-band-observer] INFO  query.logger - {"queryId":"20317c3c-d70c-b598-20cd-f038f7647900","queryText":"NA","start":1607369666752,"finish":1607369666753,"outcome":"COMPLETED","username":"mpowers"}
2020-12-07 19:34:26,761 [out-of-band-observer] INFO  query.logger - {"queryId":"20317c3d-4ef2-bd14-6949-24781167ea00","queryText":"NA","start":1607369666756,"finish":1607369666759,"outcome":"COMPLETED","username":"mpowers"}
2020-12-07 19:34:26,841 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:1:0
2020-12-07 19:34:26,841 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:1:1
2020-12-07 19:34:26,841 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:1:2
2020-12-07 19:34:26,842 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:1:3
2020-12-07 19:34:26,842 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:1:4
2020-12-07 19:34:26,842 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:1:5
2020-12-07 19:34:26,842 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317c3d-29fa-dec0-730e-654b426bf300:0:0
2020-12-07 19:34:26,904 [e5 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:1:4] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:1:4: State to report: FINISHED
2020-12-07 19:34:26,911 [e0 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:0:0] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:0:0: State to report: FINISHED
2020-12-07 19:34:26,915 [e1 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:1:2] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:1:2: State to report: FINISHED
2020-12-07 19:34:26,916 [e6 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:1:1] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:1:1: State to report: FINISHED
2020-12-07 19:34:26,916 [e2 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:1:5] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:1:5: State to report: FINISHED
2020-12-07 19:34:26,919 [e4 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:1:3] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:1:3: State to report: CANCELLED
2020-12-07 19:34:26,923 [e3 - 20317c3d-29fa-dec0-730e-654b426bf300:frag:1:0] INFO  c.d.s.e.f.FragmentStatusReporter - 20317c3d-29fa-dec0-730e-654b426bf300:1:0: State to report: CANCELLED
2020-12-07 19:34:26,927 [out-of-band-observer] INFO  query.logger - {"queryId":"20317c3d-29fa-dec0-730e-654b426bf300","queryText":"SELECT * FROM  HDFS.data.\"factTable.txt\" WHERE C > 0.0 LIMIT 100","start":1607369666763,"finish":1607369666925,"outcome":"COMPLETED","username":"mpowers"}
2020-12-07 19:34:26,943 [UserServer-1] INFO  c.d.sabot.rpc.user.UserRPCServer - [USER]: Channel closed /127.0.0.1:31010 <--> /127.0.0.1:42862 (user client)
2020-12-07 19:34:26,943 [UserServer-1] INFO  com.dremio.ConnectionLog - [b61585f8-5b73-4806-97e8-28e77caa4a74] Connection Closed

vs. 失败的方法

2020-12-07 19:35:37,698 [UserServer-1] INFO  com.dremio.ConnectionLog - [30c3f4d8-241f-4f3d-a9e6-de7aea93b752] Connection opened.
    Endpoint: 127.0.0.1:42904
    Protocol Version: 5
    Record Type: n/a
    Record Formats: 
    Support Complex Types: false
    Name: Dremio C++ Client
    Version: 1.17.0 (1.17.0)
    Application: Unknown
    User Properties: 
        schema=
        support_complex_types=false
        userName=mpowers

2020-12-07 19:35:37,698 [UserServer-1] WARN  c.dremio.sabot.rpc.user.UserSession - Ignoring unknown property: SUPPORT_COMPLEX_TYPES
2020-12-07 19:35:37,699 [UserServer-1] WARN  c.dremio.sabot.rpc.user.UserSession - Ignoring unknown property: USERNAME
2020-12-07 19:35:37,705 [out-of-band-observer] INFO  query.logger - {"queryId":"20317bf5-c9fe-5e85-6e81-9ce420af1c00","queryText":"NA","start":1607369737702,"finish":1607369737703,"outcome":"COMPLETED","username":"mpowers"}
2020-12-07 19:35:37,712 [out-of-band-observer] INFO  query.logger - {"queryId":"20317bf5-b19e-866c-5da6-2f580816e500","queryText":"NA","start":1607369737708,"finish":1607369737711,"outcome":"COMPLETED","username":"mpowers"}
2020-12-07 19:35:37,763 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:1:0
2020-12-07 19:35:37,763 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:1:1
2020-12-07 19:35:37,763 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:1:2
2020-12-07 19:35:37,763 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:1:3
2020-12-07 19:35:37,764 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:1:4
2020-12-07 19:35:37,764 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:1:5
2020-12-07 19:35:37,764 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 20317bf6-1026-5a5e-5c04-03d012fa4300:0:0
2020-12-07 19:35:37,824 [e5 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:1:0] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:1:0: State to report: FINISHED
2020-12-07 19:35:37,826 [e0 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:1:3] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:1:3: State to report: FINISHED
2020-12-07 19:35:37,830 [e0 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:0:0] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:0:0: State to report: FINISHED
2020-12-07 19:35:37,834 [e4 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:1:4] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:1:4: State to report: CANCELLED
2020-12-07 19:35:37,835 [UserServer-1] INFO  c.d.sabot.rpc.user.UserRPCServer - [USER]: Channel closed /127.0.0.1:31010 <--> /127.0.0.1:42904 (user client)
2020-12-07 19:35:37,835 [UserServer-1] INFO  com.dremio.ConnectionLog - [30c3f4d8-241f-4f3d-a9e6-de7aea93b752] Connection Closed
2020-12-07 19:35:37,835 [e3 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:1:5] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:1:5: State to report: CANCELLED
2020-12-07 19:35:37,838 [e2 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:1:2] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:1:2: State to report: CANCELLED
2020-12-07 19:35:37,840 [e6 - 20317bf6-1026-5a5e-5c04-03d012fa4300:frag:1:1] INFO  c.d.s.e.f.FragmentStatusReporter - 20317bf6-1026-5a5e-5c04-03d012fa4300:1:1: State to report: CANCELLED
2020-12-07 19:35:37,844 [out-of-band-observer] INFO  query.logger - {"queryId":"20317bf6-1026-5a5e-5c04-03d012fa4300","queryText":"SELECT * FROM  HDFS.data.\"factTable.txt\" WHERE C > 0.0 LIMIT 100","start":1607369737716,"finish":1607369737841,"outcome":"CANCELED","outcomeReason":"Query cancelled by user 'mpowers'","username":"mpowers"}

请注意,两者之间的区别是:

2020-12-07 19:34:26,943 [UserServer-1] INFO  c.d.sabot.rpc.user.UserRPCServer - [USER]: Channel closed /127.0.0.1:31010 <--> /127.0.0.1:42862 (user client)
2020-12-07 19:34:26,943 [UserServer-1] INFO  com.dremio.ConnectionLog - [b61585f8-5b73-4806-97e8-28e77caa4a74] Connection Closed

在失败的方法上记录查询之前出现,而在成功的方法上出现在记录查询之后。

编辑:为了清楚起见,我不想使用 read_sql,因为我想要运行的查询的读取时间太长,所以我无法准确测量数据库速度。

4

1 回答 1

0

Pyodbc/JDBC 流式传输大型数据集效率低下。而是使用箭头飞行连接器,这是最有效的方式。

https://dremio-client.readthedocs.io/en/stable/dremio_client.flight.html#dremio_client.flight.query

https://www.dremio.com/is-time-to-replace-odbc-jdbc/

于 2021-02-22T15:41:46.993 回答