1

我的 MySQL 服务器在将数据存储到数据库时无法识别字符 'æ' 和 'ae' 之间的区别,这给我带来了一些问题。我的目标是找到一个识别这些字符之间差异的字符集,我找到了它(utfmb3),但它会被弃用,新的​​替代品(utfmb4)不会将这些字符识别为不同的。

我试过的:

set names 'utf8mb3';
select 'æ' = 'ae';

此选择返回0(false),这意味着此字符集将这些视为不同的字符,这正是我所需要的,但 MySQL 给了我一个警告: 'utf8mb3' 已弃用,将在未来的版本中删除。请改用 utf8mb4

但是当我这样做时

set names 'utf8mb4';
select 'æ' = 'ae';

此选择返回1,这意味着utf8mb4将这些视为相同的字符,这不好..

所以,我的困境是,使用什么字符集?如果我使用utfmb3,它很快就会被弃用,那不好。如果我使用utfmb4,那将无法正常工作。

4

2 回答 2

2

= and LIKE comparisons in WHERE clauses apply a collation (not just a character set) to determine this kind of equality. This statement returns zero for the first two collations and one for the second two.

SELECT 'æ' = 'ae' COLLATE utf8mb4_unicode_ci,       -- 0
       'æ' = 'ae' COLLATE utf8mb4_general_ci,       -- 0
       'æ' = 'ae' COLLATE utf8mb4_unicode_520_ci,   -- 1
       'æ' = 'ae' COLLATE utf8mb4_german2_ci        -- 1

It seems likely your default collation is one of the last two or some other collation that handles that equality test the way you don't want it.

You can see your connection's collation setting with this statement. I suspect it is utf8mb4_unicode_520_ci.

SELECT @@collation_connection;

Be sure to define the collation for your columns with one you do want, and set your connection collation to the same thing. utf8mb4_unicode_ci is suitable. Try this.

SET collation_connection = 'utf8mb4_unicode_ci';
SELECT 'æ' = 'ae'   -- 0;

It's hard to give more specific advice without understanding your linguistic requirements better.

More info here: Difference between utf8mb4_unicode_ci and utf8mb4_unicode_520_ci collations in MariaDB/MySQL?

于 2022-03-02T12:52:56.133 回答
-1

Coalition 'utf8mb4_unicode_ci' 是您要使用的当前版本。确保您将客户端(即 php、node.python)设置为也使用正确的字符集(在 db 客户端对象和环境配置中)。

于 2022-03-02T12:34:04.107 回答