0

以下命令序列运行良好,并生成与 SQL 表匹配的 DataFrame:

copy_sql = "COPY mytable TO STDOUT WITH CSV HEADER"

conn = myengine.raw_connection()
cur = conn.cursor()
store = io.StringIO()
cur.copy_expert(copy_sql, store)
store.seek(0)

# this is for debugging
# it correctly outputs the CSV string from STDOUT
print(store.read())
store.seek(0)

cur.close()

# this works
df = pd.read_csv(store)

但是,我试图将COPY命令的输出传递给gzip,然后将gzip输出传递给STDOUT. 以下导致pandas.errors.EmptyDataError: No columns to parse from file错误。

copy_sql = "COPY mytable TO PROGRAM 'gzip -f --stdout' WITH CSV HEADER"

conn = myengine.raw_connection()
cur = conn.cursor()
store = io.StringIO()
cur.copy_expert(copy_sql, store)
store.seek(0)

# this is for debugging
# it should output the compressed string,
# but actually outputs nothing
print(store.read())
store.seek(0)

cur.close()

# this doesn't work as Pandas finds nothing in `store`
df = pd.read_csv(store, compression="gzip")

由于尝试echo "hey" | gzip -f --stdout在终端中正确地将压缩字符串输出到 STDOUT,我相信这TO PROGRAM 'gzip -f --stdout'将等同于TO STDOUT发送到 STDOUT 的输出将被压缩,但显然缺少某些东西。

conn连接通过网络连接到远程机器上的 PostgreSQL 数据库。

我真正的目标是在 CSV 输出通过网络之前对其进行压缩,然后read_csv从压缩字符串中获取 Pandas。任何其他方式来实现这一点将不胜感激。

4

0 回答 0