c# - SQL Server 上的编码问题

Question

我正在使用 SQL Server 数据库；数据库实例编码是“SQL_Latin1_General_CP1_CI_AS”。

以下代码：

UPDATE ...
SET field = CHAR(136)
WHERE...

在字段中输入以下符号：^

但！在 Latin1 代码表 127-159 代码中没有定义！它怎么会插入这个符号？

更令人困惑的是，当我在 C# 中将此字段值读取为字符串变量并将其转换为 char 时，我得到的代码是 710 而不是 136。

我尝试使用编码转换：

var latin1Encoding = Encoding.GetEncoding("iso-8859-1");
var test = latin1Encoding.GetBytes(field); // field is a string read from db

但在这种情况下，我得到代码 94，即 ^（看起来很相似，但不一样，我需要完全相同）。

score 5 · Accepted Answer

但！在 Latin1 代码表 127-159 代码中没有定义！

在 ISO-8859-1 中，定义了字符 136，但它是一个很少使用且基本上毫无意义的控制字符。

但是 SQL_Latin1_General_CP1_CI_AS 尽管名称为“Latin1”，但它不是 ISO-8859-1。它是西欧 ANSI 代码页 1252，类似于 ISO-8859-1，但在 128-159 范围内有一堆不同的符号。

代码页 1252 中的字符 136 是 U+02C6 MODIFIER LETTER CIRCUMFLEX ACCENT, ˆ; 十进制代码点编号 710。

在这种情况下，我得到代码 94，即 ^

是的，您要求转换为 ISO-8859-1，它不包括字符 U+02C6，因此您得到“最合适的后备”，这是一个看起来有点像您想要的字符. 这通常是一件坏事。许多选择的后备是非常值得怀疑的。您可以使用EncoderFallback更改此行为，例如改为引发异常。

score 0 · Accepted Answer

Okay, there's several conversion taking place here.

When you use Char(136) the number is an ASCII code, but since the number 136 is outside the standard ASCII set the character you get is the one defined by Windows-1252. That character is the circumflex.
In addition to defining the encoding of non-unicode columns, the collation also establishes some rules for translating between non-unicode characters and unicode ones when attempting to store the non-unicode character in a unicode field. If no conversion is defined you'll tend to get a ?, but in this case you get the character with the unicode code-point U+02C6. The important thing to appreciate is that the collation establishes an equivalence between the characters because it was decided that they are similar/equivalent. It has nothing to do with the actual values.
Finally, you used the iso-8859-1 encoding to get the numeric code of the circumflex in that encoding which is 94.

2 回答 2