0

cx_Oracle API was very fast for me until I tried to work with CLOB values.

I do it as follows:

import time
import cx_Oracle

num_records = 100
con = cx_Oracle.connect('user/password@sid')
cur = con.cursor()
cur.prepare("insert into table_clob (msg_id, message) values (:msg_id, :msg)")
cur.bindarraysize = num_records
msg_arr = cur.var(cx_Oracle.CLOB, arraysize=num_records)
text = '$'*2**20    # 1 MB of text
rows = []

start_time = time.perf_counter()
for id in range(num_records):
    msg_arr.setvalue(id, text)
    rows.append( (id, msg_arr) )    # ???

print('{} records prepared, {:.3f} s'
    .format(num_records, time.perf_counter() - start_time))
start_time = time.perf_counter()
cur.executemany(None, rows)
con.commit()
print('{} records inserted, {:.3f} s'
    .format(num_records, time.perf_counter() - start_time))

cur.close()
con.close()
  1. The main problem worrying me is performance:

    100 records prepared, 25.090 s - Very much for copying 100MB in memory!
    100 records inserted, 23.503 s - Seems to be too much for 100MB over network.
    

    The problematic step is msg_arr.setvalue(id, text). If I comment it, script takes just milliseconds to complete (inserting null into CLOB column of course).

  2. Secondly, it seems to be weird to add the same reference to CLOB variable in rows array. I found this example in internet, and it works correctly but do I do it right?

  3. Are there ways to improve performance in my case?

UPDATE: Tested network throughput: a 107 MB file copies in 11 s via SMB to the same host. But again, network transfer is not the main problem. Data preparation takes abnormally much time.

4

1 回答 1

0

奇怪的解决方法(感谢 cx_Oracle 邮件列表中的 Avinash Nandakumar),但它是插入 CLOB 时大大提高性能的真正方法:

import time
import cx_Oracle
import sys

num_records = 100
con = cx_Oracle.connect('user/password@sid')
cur = con.cursor()
cur.bindarraysize = num_records
text = '$'*2**20    # 1 MB of text
rows = []

start_time = time.perf_counter()
cur.executemany(
    "insert into table_clob (msg_id, message) values (:msg_id, empty_clob())",
    [(i,) for i in range(1, 101)])
print('{} records prepared, {:.3f} s'
      .format(num_records, time.perf_counter() - start_time))

start_time = time.perf_counter()
selstmt = "select message from table_clob " +
          "where msg_id between 1 and :1 for update"
cur.execute(selstmt, [num_records])
for id in range(num_records):
    results = cur.fetchone()
    results[0].write(text)
con.commit()
print('{} records inserted, {:.3f} s'
      .format(num_records, time.perf_counter() - start_time))

cur.close()
con.close()

从语义上讲,这与我原来的帖子中的不完全相同,我想尽可能简单地举例说明原理。关键是您应该 insert emptyclob(),然后选择它并写入其内容。

于 2014-07-23T05:59:15.790 回答