3

When working in the Moovweb SDK, length("çãêá") is expected to return 4, but instead returns 8. How can I ensure that the length function works correctly when using Unicode characters?

4

2 回答 2

3

This is a common issue with Unicode characters and the length() function using the wrong character set. To fix it you need to set the charset_determined variable to make sure the correct character set is being used before making the call to length(), like so in your tritium code:

$charset_determined = "utf-8"
# your call to length() here
于 2013-03-29T19:33:49.137 回答
2

In Unicode, there is no such thing as a length of a string or "number of characters". All this comes from ASCII thinking.

You can choose from one of the following, depending what you exactly need:

  • 对于光标移动、文本选择等,应使用字素簇。

  • 为了限制输入字段、文件格式、协议或数据库中字符串的长度,长度以某种预定编码的代码单元来衡量。原因是任何长度限制都源自为较低级别的字符串分配的固定内存量,无论是在内存、磁盘还是在特定的数据结构中。

屏幕上显示的字符串大小与字符串中的代码点数无关。为此,必须与渲染引擎进行通信。即使在等宽字体和终端中,代码点也不占用一列。POSIX 考虑到了这一点。

http://utf8everywhere.org中有更多信息

于 2013-03-30T13:14:07.977 回答