php - 这些字符是什么样的以及我如何用 PHP 修剪它

Question

这可能是一个愚蠢的问题。我有一个网站正在运行，它允许其用户提交他们的内容。

一些用户正在玩我不想看到的奇怪角色（对我来说）。以下是其中一些：

▄ █ ▄ █ ▄ █ ▄ █ ▄

什么样的以及如何修剪这些字符？我已经尝试过这样的一些方法，但是我怎样才能做到这一点而不必松开 html 特殊字符，如 © ® ... 等。

谢谢

score 1 · Accepted Answer

您可以根据字符的 Unicode 属性去除字符，如下所示：

// strip out symbols
echo preg_replace('/[\p{S}]+/u', '', 'Hello ▄ █ ▄ █ ▄ █ ▄ █ ▄ World');
// Hello World

演示

您可以在手册中阅读有关正则表达式的 Unicode 功能的更多信息。

不幸的是，上面的代码也去掉了你的版权和商标符号；您可能需要考虑对这些字符进行例外处理，例如：

echo preg_replace('/[^\p{L}\p{Z}©®]+/u', '', 'Hello ▄ █ ▄ █ ▄ █ ▄ █ ▄ World © ®');

score 0 · Accepted Answer

you can use htmlentities() or htmlspecialchars().

htmlentities()

This function is identical to htmlspecialchars() in all ways, except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.

htmlspecialchars:

Certain characters have special significance in HTML, and should be represented by HTML entities if they are to preserve their meanings. This function returns a string with some of these conversions made; the translations made are those most useful for everyday web programming. If you require all HTML character entities to be translated, use htmlentities() instead.

The difference is encoding. The choices are everything (entities) or "special" characters, like ampersand, double and single quotes, less than, and greater than (specialchars).

php - 这些字符是什么样的以及我如何用 PHP 修剪它

2 回答 2

Related

Reference