mysql - 如何正确处理 mysql 中的 dakuten 和 handakuten 日文字符？

Question

score 3 · Accepted Answer

This is because the collation you used (utf8mb4_unicode_ci, utf8mb4_unicode_520_ci and utf8mb4_0900_ai_ci) only compares character's base letter. For example, 'ぺ' = 'へ' + U+309A ◌゚, 'へ' is the base letter of 'ぺ'. So for your case, all 3 characters' base letter is same, 'へ'. So it is correct result for those collations return '1'.

MySQL team is developing a new Japanese collation for utf8mb4 character set. It will differentiate these dakuten characters from base character. It will come soon.

score 1 · Accepted Answer

SELECT 'へ' = 'ぺ' COLLATE utf8mb4_unicode_ci; --> 0  (ditto for general_ci)
SELECT 'へ' = 'ぺ' COLLATE utf8mb4_unicode_520_ci; --> 1

The latter is a newer Unicode standard, so it is, in theory, more correct.

But what are you really doing? Probably comparing one column to another? Are they both utf8mb4_unicode_520_ci? (The database and the connection don't matter.)

Or is one side of = a column and the other is a literal?

Do you establish the collation when connecting?

Addenda

In version 8.0.0, all of these give 1:

utf8mb4_unicode_ci  -- a change from 0 in 5.6.12, but 1 in 5.7.15?
utf8mb4_unicode_520_ci
utf8mb4_0900_ai_ci

mysql - 如何正确处理 mysql 中的 dakuten 和 handakuten 日文字符？

2 回答 2

Related

Reference