2 回答
I've left all of my testing echoes in my code block and merely commented them out in case you wanted to see what is being generated throughout the process.
I took some liberties with your code. I didn't like the function calling the function, and I condensed your lookup array into a space-led string. This will serve to have the same effect as your indexed array that starts from 1. The converting of the lookup from array to string means I can use mb_strpos()
instead of array_search()
.
The crucial point to fix in your code was in the looping, specifically accessing the letters with [$i]
. You see, you cannot treat these multibyte characters as single byte characters -- you must use mb_substr()
to access the "whole" letter.
Setting values for $alphabet
and encoding
means, you don't have to write a second "helper" function to pass all of the necessary data. uksort()
will pass its expected two arguments and everything goes ahead smoothly.
One final piece of advice is: mb_
functions are expensive, so always try to return
in your code as soon as possible and leave the mb_
functions farther "downscript" whenever logically possible.
Here is my suggested code: (Demo)
function alphabetize_custom($a, $b, $alphabet = " -,.ȝjʿwbpfmnrhḥḫẖsšqkgtṯdḏ⸗/()[]<>{}'*#I0123456789&@%", $encoding = 'UTF-8') {
//echo "\n----\n$a =vs= $b";
$mb_length = max(mb_strlen($a, $encoding), mb_strlen($b, $encoding));
for ($i = 0; $i < $mb_length; ++$i) {
//echo "\n";
$a_char = mb_substr($a, $i, 1, $encoding);
$b_char = mb_substr($b, $i, 1, $encoding);
//echo "$a_char -vs- $b_char\n";
//echo "(" , mb_strlen($a_char, $encoding), " & ", mb_strlen($b_char, $encoding), ")\n";
if ($a_char === $b_char) {/*echo "identical, continue";*/ continue;}
if (!mb_strlen($a_char, $encoding)) { /* echo "a is empty -1";*/ return -1;}
if (!mb_strlen($b_char, $encoding)) { /*echo "b is empty 1";*/ return 1;}
$a_offset = mb_strpos($alphabet, $a_char, 0, $encoding);
$b_offset = mb_strpos($alphabet, $b_char, 0, $encoding);
//echo "[" , $a_offset, " & ", $b_offset, "]\n";
if ($a_offset == $b_offset) { /*echo "== offsets, continue";*/ continue;}
if ($a_offset < $b_offset) { /*echo "a offset -1";*/ return -1;}
//echo "b offset 1";
return 1;
}
//echo "0";
return 0;
}
$result = [
"nṯr" => ["Ka.C.Coptite.urkVIII,176b", "Ka.C.Coptite.urkVIII,177,1"],
"n" => ["Ka.C.Coptite.urkVIII,176c", "Ka.C.Coptite.urkVIII,177,1", "Ka.C.Coptite.urkVIII,177,2"],
"nḫȝḫȝ" => ["Ka.C.Coptite.urkVIII,176c"],
"nwj" => ["Ka.C.Coptite.urkVIII,176c"],
"nfr" => ["Ka.C.Coptite.urkVIII,176c", "Ka.C.Coptite.urkVIII,177,2"],
"nḥḥ" => ["Ka.C.Coptite.urkVIII,176e", "Ka.C.Coptite.urkVIII,177,1", "Ka.C.Coptite.urkVIII,177,1"],
"nḏ" => ["Ka.C.Coptite.urkVIII,177,1"]
];
uksort($result, 'alphabetize_custom');
var_export($result);
Output:
array (
'n' =>
array (
0 => 'Ka.C.Coptite.urkVIII,176c',
1 => 'Ka.C.Coptite.urkVIII,177,1',
2 => 'Ka.C.Coptite.urkVIII,177,2',
),
'nwj' =>
array (
0 => 'Ka.C.Coptite.urkVIII,176c',
),
'nfr' =>
array (
0 => 'Ka.C.Coptite.urkVIII,176c',
1 => 'Ka.C.Coptite.urkVIII,177,2',
),
'nḥḥ' =>
array (
0 => 'Ka.C.Coptite.urkVIII,176e',
1 => 'Ka.C.Coptite.urkVIII,177,1',
2 => 'Ka.C.Coptite.urkVIII,177,1',
),
'nḫȝḫȝ' =>
array (
0 => 'Ka.C.Coptite.urkVIII,176c',
),
'nṯr' =>
array (
0 => 'Ka.C.Coptite.urkVIII,176b',
1 => 'Ka.C.Coptite.urkVIII,177,1',
),
'nḏ' =>
array (
0 => 'Ka.C.Coptite.urkVIII,177,1',
),
)
Just for comparison's sake, I wrote an alternative code block that uses array_search()
as your original code does and not surprisingly it appears to be more efficient according to the speed tests on 3v4l.org. This is likely due to the removal of a couple of 4 mb_
functions, which I previously mentioned to be "expensive". The following snippet provides the same output.
Code: (Demo)
function alphabetize_custom($a, $b) {
$alphabet = [' ', '-', ',', '.', 'ȝ', 'j', 'ʿ', 'w', 'b', 'p', 'f', 'm', 'n', 'r', 'h', 'ḥ', 'ḫ', 'ẖ', 's', 'š', 'q', 'k', 'g', 't', 'ṯ', 'd', 'ḏ', '⸗', '/', '(', ')', '[', ']', '<', '>', '{', '}', "'", '*', '#', 'I', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '&', '@', '%'];
unset($alphabet[0]); // removes dummy first key, effectively starting the keys from 1
$encoding = 'UTF-8';
$mb_length = max(mb_strlen($a, $encoding), mb_strlen($b, $encoding));
for ($i = 0; $i < $mb_length; ++$i) {
$a_char = mb_substr($a, $i, 1, $encoding);
$b_char = mb_substr($b, $i, 1, $encoding);
if ($a_char === $b_char) continue;
$a_key = array_search($a_char, $alphabet);
$b_key = array_search($b_char, $alphabet);
if ($a_key === $b_key) continue;
return $a_key - $b_key;
}
return 0;
}
$result = [
"nṯr" => ["Ka.C.Coptite.urkVIII,176b", "Ka.C.Coptite.urkVIII,177,1"],
"n" => ["Ka.C.Coptite.urkVIII,176c", "Ka.C.Coptite.urkVIII,177,1", "Ka.C.Coptite.urkVIII,177,2"],
"nḫȝḫȝ" => ["Ka.C.Coptite.urkVIII,176c"],
"nwj" => ["Ka.C.Coptite.urkVIII,176c"],
"nfr" => ["Ka.C.Coptite.urkVIII,176c", "Ka.C.Coptite.urkVIII,177,2"],
"nḥḥ" => ["Ka.C.Coptite.urkVIII,176e", "Ka.C.Coptite.urkVIII,177,1", "Ka.C.Coptite.urkVIII,177,1"],
"nḏ" => ["Ka.C.Coptite.urkVIII,177,1"]
];
uksort($result, 'alphabetize_custom');
var_export($result);
The charset
in the meta
tag needs to be UTF-8
. That is what the outside world calls it; MySQL calls it utf8mb4
.
Inside MySQL, declare the collation of the columns you want to be ordered with COLLATION utf8mb4_unicode_520_ci
. With that, MySQL can do the work for you:
SELECT ... ORDER BY col ...