php - 正则表达式：一起删除非字母数字字符、多个空格和 trim()

Question

我有一个 $text 去除所有非字母数字字符，用单个空格替换多个空格和换行符，并消除开始和结束空格。

到目前为止，这是我的解决方案。

$text = '
some-    text!! 

for testing?
'; // $text to format

//strip off all non-alphanumeric chars
$text = preg_replace("/[^a-zA-Z0-9\s]/", "", $text);

//Replace multiple white spaces by single space 
$text = preg_replace('/\s+/', ' ', $text);

//eliminate beginning and ending space
$finalText = trim($text);
/* result: $finalText ="some text for testing";
without non-alphanumeric chars, newline, extra spaces and trim()med */

是否可以在一个正则表达式中组合/实现所有这些？因为我会在一行中得到所需的结果，如下所示

$finalText = preg_replace(some_reg_expression, $replaceby, $text);

谢谢

编辑：用测试字符串澄清

score 3 · Accepted Answer

当然可以。这很容易。

重新看起来像：

((?<= )\s*)|[^a-zA-Z0-9\s]|(\s*$)|(^\s*)

我手头没有 PHP，我用过 Perl（只是为了测试 re 并证明它有效）（你可以在这里玩我的代码）：

$ cat test.txt 
         a       b       c    d
a b c e f g             fff  f

$ cat 1.pl 
while(<>) {
    s/((?<= )\s*)|[^a-zA-Z0-9\s]|(\s*$)|(^\s*)//g;
    print $_,"\n";
}

$ cat test.txt | perl 1.pl 
a b c d
a b c e f g fff f

对于 PHP，它将是相同的。

什么是RE？

((?<= )\s*)       # all spaces that have at least one space before them
|
[^a-zA-Z0-9\s]    # all non-alphanumeric characters
|
(\s*$)            # all spaces at the end of string
|
(^\s*)            # all spaces at the beginning of string

这里唯一棘手的部分是((?<= )\s*)lookbehind assertion。当且仅当空格的子字符串之前有空格时，您才删除空格。

如果您想了解前瞻/后瞻断言的工作原理，请查看http://www.regular-expressions.info/lookaround.html。

讨论更新：

什么时候发生$text ='some ? ! ? text';？然后生成的字符串在“some”和“text”之间包含多个空格。

解决这个问题并不是那么容易，因为需要具有可变长度的积极的后向断言，而目前这是不可能的。不能简单地检查空格，因为它可能会发生，因此它不是空格而是非字母数字字符，并且无论如何都会被删除（例如：符号" !"将"!"被删除，但 RE 一无所知；一个人需要类似的东西，(?<=[^a-zA-Z0-9\s]* )\s*但不幸的是将不起作用，因为 PCRE 不支持后视可变长度断言。

score 1 · Accepted Answer

我不认为你可以用一个正则表达式来实现。您基本上需要坚持一个if else条件，而仅通过正则表达式是不可能的。

您基本上需要一个正则表达式来删除非字母数字数字，另一个来折叠空格，这基本上是您已经在做的事情。

score 1 · Accepted Answer

如果这是您正在寻找的东西，请检查这个---

$patterns = array ('/[^a-zA-Z0-9\s]/','/\s+/');
$replace = array ("", ' ');
trim( preg_replace($patterns, $replace, $text) );

可能它可能需要一些修改，如果这是你想做的事情，请告诉我？

score 0 · Accepted Answer

为了您自己的理智，您将希望保留以后仍然可以理解和编辑的正则表达式:)

$text = preg_replace(array(
    "/[^a-zA-Z0-9\s]/", // remove all non-space, non-alphanumeric characters
    '/\s{2,}/', // replace multiple white space occurrences with single 
), array(
    '', 
    ' ',
), trim($originalText));

score 0 · Accepted Answer

0

$text =~ s/([^a-zA-Z0-9\s].*?)//g;

不必比这更难。

于 2013-09-30T20:20:39.887 回答

php - 正则表达式：一起删除非字母数字字符、多个空格和 trim()

5 回答 5

Related

Reference