When working in the Moovweb SDK, length("çãêá")
is expected to return 4
, but instead returns 8. How can I ensure that the length
function works correctly when using Unicode characters?
2 回答
This is a common issue with Unicode characters and the length()
function using the wrong character set. To fix it you need to set the charset_determined
variable to make sure the correct character set is being used before making the call to length()
, like so in your tritium code:
$charset_determined = "utf-8"
# your call to length() here
In Unicode, there is no such thing as a length of a string or "number of characters". All this comes from ASCII thinking.
You can choose from one of the following, depending what you exactly need:
对于光标移动、文本选择等,应使用字素簇。
为了限制输入字段、文件格式、协议或数据库中字符串的长度,长度以某种预定编码的代码单元来衡量。原因是任何长度限制都源自为较低级别的字符串分配的固定内存量,无论是在内存、磁盘还是在特定的数据结构中。
屏幕上显示的字符串大小与字符串中的代码点数无关。为此,必须与渲染引擎进行通信。即使在等宽字体和终端中,代码点也不占用一列。POSIX 考虑到了这一点。