php - 按包含 html 实体的第一个字母拆分数组

Question

我有一个像这样的国家的数组

array(249) {
  [0]=>
  array(4) {
    ["country_id"]=>
    string(1) "2"
    ["country_name_en"]=>
    string(19) "&Aring;land Islands"
    ["country_alpha2"]=>
    string(2) "AX"
    ["country_alpha3"]=>
    string(3) "ALA"
  }
  etc.
}

我想用第一个字母分割它，所以我得到一个这样的数组

array(26) {
 'A' => array(10) {
    array(4) {
      ["country_id"]=>
      string(1) "2"
      ["country_name_en"]=>
      string(19) "&Aring;land Islands"
      ["country_alpha2"]=>
      string(2) "AX"
      ["country_alpha3"]=>
      string(3) "ALA"
    }
    etc.
  }
  etc.
}

但问题是国家名称数组包含 html 实体作为第一个字符。

任何想法如何做到这一点？

提前致谢

彼得

score 2 · Accepted Answer

如果你想Åland Islands被归档A，你需要比已经建议的html_entity_decode()做更多的事情。

intl包含Normalizer::normalize()，一个要转换Å为Å. 迷茫了吗？该 unicode 符号 (U+00C5 ) 在 UTF-8 中可以表示为0xC385(Composition) 和(Decomposition)。是，是。0x41CC8A0x41A0xCC8Å

因此，要正确归档您的岛屿，您需要执行以下操作：

$string = "&Aring;land Islands";
$s = html_entity_decode($string, ENT_QUOTES, 'UTF-8');
$s = Normalizer::normalize($s, Normalizer::FORM_KD);
$s = mb_substr($s, 0, 1);

很有可能，您的环境没有安装intl。如果是这种情况，您可能会查看urlify()，这是一个将字符串简化为字母数字部分的函数。

有了以上你应该能够

循环原始数组
提取国名
清理国家名称并提取第一个字符
根据（3）的特征构建一个新数组

注意：请注意，国家Armenia和Austria地区Australia都将在A.

score 1 · Accepted Answer

遍历数组，使用html_entity_decode()解码 html 实体，然后使用mb_substr()进行拆分。

foreach($array as $values) {
    $values['country_name_en'] = html_entity_decode($values['country_name_en']);
    $index = mb_substr($values['country_name_en'], 0, 1);

    $new_array[$index] = $values;
}

或者您可以使用 jlcd 建议的功能：

function substr_unicode($str, $s, $l = null) {
    return join("", array_slice(
        preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l));
}

foreach($array as $values) {
    $values['country_name_en'] = html_entity_decode($values['country_name_en']);
    $index = substr_unicode($values['country_name_en'], 0, 1);

    $new_array[$index] = $values;
}

php - 按包含 html 实体的第一个字母拆分数组

2 回答 2

Related

Reference