python - 使用非 ascii 字符时的 SQL Server (SQLCMD)、Python 和编码问题

Question

在询问 SQL Server 2005 中的数据时，我的 python 代码遇到编码问题。

（因为我无法编译 PyMSSQL-2.0.0b1）我正在使用这段代码，我可以做一些选择，但现在我坚持我不知道 SQLCMD 输出给我的问题： (

（我必须使用表中包含的欧洲语言，所以我不得不面对其他带有口音的编码等等）

例如：

当我从 Ms SQLServer Management Studio 阅读（选择）时，我有这个国家/地区名称：'Ceská republika'（注意第一个 a 是尖锐的）
从命令行从 SQLCMD 使用它时（Windows 7 中的 Powershell），它仍然可以，我可以看到“Cesk'a with rapid'”
现在，当使用 Python 和配方中的 os.popen 技巧时，即使用此连接字符串：

sqlcmd -U 管理员名 -P 密码 -S 服务器名 -d 数据库名 /w 8192 -u

我得到这个字符串：'Cesk\xa0 republika'

请注意我确实知道它是什么编码的 \xa0，以及我如何从这个 \xa0 传递到 {a withacute}...

如果我从 Python 和 unicode 测试，我应该有这个 '\xe1'

>>> unicode('Cesk\xa0 republika')

Traceback (most recent call last):
  File "<pyshell#13>", line 1, in <module>
    unicode('Cesk\xa0 republika')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 4: ordinal not in range(128)

>>> unicode_a_with_acute = u'\N{LATIN SMALL LETTER A WITH ACUTE}'
>>> unicode_a_with_acute
u'\xe1'
>>> print unicode_a_with_acute
á
>>> print unicode_a_with_acute.encode('cp1252')
á
>>> unicode_a_with_acute.encode('cp1252')
'\xe1'
>>> print 'Cesk\xa0 republika'.decode('cp1252')
Cesk  republika
>>> print 'Cesk\xa0 republika'.decode('utf8')

Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    print 'Cesk\xa0 republika'.decode('utf8')
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
 UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 4: invalid start byte

那么 SQLCMD 给了我什么？我应该如何强制它和/或 os.popen 和其他人确保我对 Python 有可理解的 utf8？

（请注意，我已经尝试在 SQLCMD 的 os.popen cmd 上使用和不使用 -u 结尾，这应该代表要求 SQLCMD 以 unicode 回答，但没有效果，我也尝试用“选择”来提供它" 以 utf8 编码的 python 字符串没有更多成功：

 sqlstr = unicode('select * from table_pays where country_code="CZ"')
 cu = c.cursor
 lst = cu.execute(sqlstr)
 rows = cu.fetchall()
 for x in rows:
      print x

 ( 'CZ          ', 'Cesk\xa0 republika       ')

)

另一点：根据我的谷歌搜索，关于“sqlcmd.exe”，还有这些参数可能会有所帮助：

[ -f < codepage > | i: < codepage > [ < , o: < codepage > ] ]

但我无法指定正确的值，我不知道可能的值是什么，顺便说一句，使用（或不使用）：

[ -u unicode output]

也帮不了我...

score 0 · Accepted Answer

问题可能是控制台默认在 ascii 模式下工作，并且输出是通过当前代码页设置转换的。您可以尝试以下操作，或者将结果写入单独的文件： -o <file> -u

然后结果文件将具有正确的 ucs2 编码，python 很乐意采用。另一个是设置 utf8 控制台输出（未经测试）：

# setup utf8 on windows console
cmode = 'mode con: codepage select=65001 > NUL & '
cmd = 'my command'
f = os.popen(cmode + cmd)
out = f.readlines()

score 0 · Accepted Answer

看起来您的默认代码页是 850 或 437。永远不要试图猜测代码页：chcp在命令提示符下会告诉您系统设置为使用什么。

尝试使用chcp或设置命令处理器代码页mode con:不太可能有帮助，因为它们为控制台设置了输出代码页，而不是用于点子或重定向到文件。

要在管道中获取 unicode（或者更确切地说是 utf-16）输出，请使用cmd /u：

>>> subprocess.check_output('''cmd /u /c "echo hello\xe1"''').decode('utf16')
'helloá\r\n'
>>>

但是您几乎可以肯定只安装一个真正的数据库适配器会更好。

python - 使用非 ascii 字符时的 SQL Server (SQLCMD)、Python 和编码问题

2 回答 2

Related

Reference