php - htmlentities($string,ENT_SUBSTITUTE) 的替代方案

Question

我有一个愚蠢的问题；

目前，我正在为服务器上的一家公司制作网站，该服务器实际上有一点过时的 PHP 版本（5.2.17）。我有一个数据库，其中许多字段都是带有 ' é ä è ê ' 等字符的 varchar，我必须在 HTML 页面中显示这些字段。

因此，由于 PHP 的版本已经过时（并且我不允许更新它，因为网站的某些部分必须继续工作并且我无权编辑它们）我不能将 htmlentities 函数与 ENT_SUBSTITUTE 一起使用参数，因为它是在 5.4 版本之后才添加的。

所以我的问题是：

是否存在替代 htmlentities($string,ENT_SUBSTITUTE); 还是我必须自己编写一个带有各种奇怪字符的函数，无论如何这都是不完整的。

score 2 · Accepted Answer

定义一个用于处理格式错误的字节序列的函数，并在将字符串传递给 htmlentties 之前调用该函数。有多种方式来定义函数。

首先，如果您不使用 Windows，请尝试 UConverter::transcode。

http://pecl.php.net/package/intl

如果您愿意直接处理字节，请参阅我之前的回答。

https://stackoverflow.com/a/13695364/531320

最后一个选项是开发 PHP 扩展。感谢 php_next_utf8_char，这并不难。这是代码示例。“scrub”这个名字来自 Ruby 2.1（参见 Ruby 1.9.X 中 Iconv.conv("UTF-8//IGNORE",...) 的等价物？）

// header file
// PHP_FUNCTION(utf8_scrub);

#include "ext/standard/html.h"
#include "ext/standard/php_smart_str.h"

const zend_function_entry utf8_string_functions[] = {
    PHP_FE(utf8_scrub, NULL)
    PHP_FE_END
};

PHP_FUNCTION(utf8_scrub)
{
    char *str = NULL;
    int len, status;
    size_t pos = 0, old_pos;
    unsigned int code_point;
    smart_str buf = {0};

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &str, &len) == FAILURE) {
        return;
    }

    while (pos < len) {

        old_pos = pos;
        code_point = php_next_utf8_char((const unsigned char *) str, len, &pos, &status);

        if (status == FAILURE) {

            smart_str_appendl(&buf, "\xEF\xBF\xBD", 3);

        } else {

            smart_str_appendl(&buf, str + old_pos, pos - old_pos);

        }

    }

    smart_str_0(&buf);
    RETURN_STRINGL(buf.c, buf.len, 0);
    smart_str_free(&buf);
}

score 0 · Accepted Answer

ENT_SUBSTITUTE如果您的编码处理正确，则不需要。

如果你的数据库中的字符是utf-8，存储在utf-8中，读入utf-8并以utf-8显示给用户应该没有问题。

score 0 · Accepted Answer

只需添加

if (!defined('ENT_SUBSTITUTE')) define('ENT_SUBSTITUTE', 0);

并且您将能够将 ENT_SUBSTITUTE 用于 htmlentities。

php - htmlentities($string,ENT_SUBSTITUTE) 的替代方案

3 回答 3

Related

Reference