php - 在 PHP 5.3 中替换 UTF-8 字符

Question

为什么这个测试用例不起作用？

<?php
// cards with cyrillic inidices and suits in UTF-8 encoding
$a = array('7♠', 'Д♠', 'К♠', '8♦', 'В♦', 'Д♦', '10♣', '10♥', 'В♥', 'Т♥');
foreach ($a as $card) {
        $suit = substr($card, -1);

        $card = preg_replace('/(\d+)♥/', '<span class="red">$1&hearts;</span>', $card);
        $card = preg_replace('/(\d+)♦/', '<span class="red">$1&diams;</span>', $card);
        $card = preg_replace('/(\d+)♠/', '<span class="black">$1&spades;</span>', $card);
        $card = preg_replace('/(\d+)♣/', '<span class="black">$1&clubs;</span>', $card);

        printf("suit: %s, html: %s\n", $suit, $card);
}
?>

输出：

suit: ▒, html: <span class="black">7&spades;</span>
suit: ▒, html: Д♠
suit: ▒, html: К♠
suit: ▒, html: <span class="red">8&diams;</span>
suit: ▒, html: В♦
suit: ▒, html: Д♦
suit: ▒, html: <span class="black">10&clubs;</span>
suit: ▒, html: <span class="red">10&hearts;</span>
suit: ▒, html: В♥
suit: ▒, html: Т♥

即我在我的 PHP 脚本中遇到了 2 个问题：

为什么没有正确提取最后一个 UTF-8 字符？
为什么只有第一套被替换preg_replace？

使用 PHP 5.3.3，PostgreSQL 8.4.12 在 CentOS 6.2 上保存 UTF-8 JSON（带有俄语文本和卡片套装）。

如果 1. 是 PHP 5.3.3 中的错误，那么有没有好的解决方法？（我不想升级库存包）。

更新：

<?php
$a = array('7♠', 'Д♠', 'К♠', '8♦', 'В♦', 'Д♦', '10♣', '10♥', 'В♥', 'Т♥');
foreach ($a as $card) {
        $suit = mb_substr($card, -1, 1, 'UTF-8');

        $card = preg_replace('/(\d+)♥/u', '<span class="red">$1&hearts;</span>', $card);
        $card = preg_replace('/(\d+)♦/u', '<span class="red">$1&diams;</span>', $card);
        $card = preg_replace('/(\d+)♠/u', '<span class="black">$1&spades;</span>', $card);
        $card = preg_replace('/(\d+)♣/u', '<span class="black">$1&clubs;</span>', $card);

        printf("suit: %s, html: %s\n", $suit, $card);
}
?>

新输出：

suit: ♠, html: <span class="black">7&spades;</span>
suit: ♠, html: Д♠
suit: ♠, html: К♠
suit: ♦, html: <span class="red">8&diams;</span>
suit: ♦, html: В♦
suit: ♦, html: Д♦
suit: ♣, html: <span class="black">10&clubs;</span>
suit: ♥, html: <span class="red">10&hearts;</span>
suit: ♥, html: В♥

score 10 · Accepted Answer

substr是天真的 PHP 核心函数之一，它假定 1 字节 = 1 个字符。从字符串中substr(..., -1)提取最后一个字节。“♠”虽然长于一个字节。你应该mb_substr($card, -1, 1, 'UTF-8')改用。

您需要在正则表达式中添加u(PCRE_UTF8) 修饰符，以使其正确处理 UTF-8 编码的表达式和字符串：

preg_replace('/(\d+)♥/u', ...

php - 在 PHP 5.3 中替换 UTF-8 字符

1 回答 1

Related

Reference