python-3.x - 使用 ANSI 驱动程序通过 python psycopg2 连接到 postgreSQL DB

Question

我必须将数据从一个 postgreSQL DB（旧）传输到另一个 postgresSQL DB（新）。old 是用 win1252 编码的。New 以 utf-8 编码。

我已经尝试过不同的方法。pandas.to_sql、sqlalchemy、psycopg2 等等，但由于编码“问题”而一直失败。我做了一些研究，最有效的事情看起来像是驾驶员方面的问题。据我所知，psycopg2 使用 unicode 驱动程序，但使用我的源数据库版本（x86_64 上的 PostgreSQL 9.4.20）我必须使用 ANSI 来绕过这些编码问题。

如果可以在没有编码问题的情况下导出受影响的表，我已经使用 ETL 工具进行了测试。没有问题是可能的。由于这个测试，我很确定这不是真正的编码问题，而是驱动程序处理问题。

当我只是使用一个样本来测试加载数据是否正常工作时，我已经注意到 pandas 的速度很慢。我必须加载 1.2 条 mio 记录。但这永远存在。因此postgreSQL 复制方法可能是首选方法。从我的角度来看，psycopg2 正在使用标准连接字符串（https://halvar.at/python/odbc_dsn_connection_strings/）。但我必须使用 ANSI 驱动程序。

我试图将 SQLAlchemy 传递给你的 psycopg2 连接器。但这不起作用。

stage_engine_string = ("{PostgreSQL ANSI}+psycopg2://" + str(stage_user) + ":" + str(stage_password) +  "@" + str(stage_host) + ":"  + str(stage_port) + "/" + str(stage_database))

因为

conn = psycopg2.connect(**params)

只允许传递参数。

host = 
database = 
user = 
password = 
port =

在我尝试上述方法之前，我尝试了 ex。

cur.copy_to(open("sql_tmp_export.csv", "w", encoding="utf-8", errors="ignore"), "table", sep=";", columns=("no","description"))

,

conn.decode("win1250").encode("utf8")

和

conn.set_client_encoding("win1250")

但我一直收到一个编码问题。根据 postgres 的文档，在 utf8 和 win1250 之间切换应该永远不会成为问题。

在 ETL 工具上，我遇到了类似的问题，但能够通过发送

"set client_encoding=\"windows-1250\"

在建立与数据库的连接之后。

但是如果我在 psycopg2 中尝试这个

cur.execute("set client_encoding=\"windows-1250\;select * from table")

我仍然遇到编码问题。

如果我可以选择通过驱动程序建立 psycopg2 连接，有什么线索吗？我认为这应该可以解决我的问题。

score 0 · Accepted Answer

由于后续问题，我的真正问题（从数据库获取数据）没有得到解决。如果你想进入，我很乐意讨论我的下一个问题：Downloading a postgreSQL pg_dump file from a remote server using Python

但我能够解决这个问题。如果您想使用 ANSI，您必须从https://www.postgresql.org/ftp/odbc/versions/msi/安装最后一个 ODBC 驱动程序

然后您可以将 psycopg2 连接切换到 pyodbc 连接。

import pyodbc
conn_str = (
    "DRIVER={PostgreSQL Ansi(x64)};"
    "DATABASE="+database+";"
    "UID="+user+";"
    "PWD="+password+";"
    "SERVER="+host+";"
    "PORT="+port+";"
)
conn = pyodbc.connect(conn_str)
cur = conn.execute("SELECT 1")
row = cur.fetchone()
print(row)
cur.close()
conn.close()

score 0 · Accepted Answer

我的一般问题现在也已解决。但解决方案很奇怪。如果有人坚持类似的事情，我只需运行两次相同的脚本，但首先使用限制和偏移量。

def any_postrgres_method_to_load_data_from_db:
      conn = some_lib.conect(var1, var2)
      cur = conn.cursor()

      sql_pre_statement = """\
        set client_encoding = "Windows-1250"
        """
      cur.execute(sql_pre_statement)

      sql_statement = """\
        select * from n
        """
      cur.execute(sql_statement)
      df = pandas.read_sql_query(sql, conn)
      df.to_csv("sql_tmp_export.csv", index=False)

上面的脚本返回了几个编码问题。在运行脚本后稍微调整如下所示，我能够运行原来的工作。

def any_postrgres_method_to_load_data_from_db:
      conn = some_lib.conect(var1, var2)
      cur = conn.cursor()

      sql_pre_statement = """\
        set client_encoding = "Windows-1250"
        """
      cur.execute(sql_pre_statement)

      sql_statement = """\
        select * from n offset 500 limit 1000
        """
      cur.execute(sql_statement)
      df = pandas.read_sql_query(sql, conn)
      df.to_csv("sql_tmp_export.csv", index=False)

我真的无法解释这一点。我只是觉得远程数据库的缓存中有一些奇怪的东西。

python-3.x - 使用 ANSI 驱动程序通过 python psycopg2 连接到 postgreSQL DB

2 回答 2

Related

Reference