在使用 php dom 解析器解析不同的网站后,我得到了包含大量空行、意外回车、多个空格、制表符和其他惊喜的多行字符串:
输入
Partner Company
Firstname Lastname
Street. 152
12345 City
Tel: 01234 567898
Fax: 01234 567899
Mobile: 0123 567899
现在,我一直在尝试使用 preg_replace 函数清理字符串...
代码
$lineToOutput = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $lineToOutput); // remove all blank (empty lines)
$lineToOutput = preg_replace("/[\t]/", " ", $lineToOutput); // convert tabs to spaces
$lineToOutput = preg_replace("/[ ]{2,}/", " ", $lineToOutput); // convert multiple spaces to single spaces
$lineToOutput = preg_replace("/[\n] /", "\n", $lineToOutput); // remove spaces at beginning of lines
$lineToOutput = preg_replace("/ [\n]/", "\n", $lineToOutput); // remove spaces at end of lines
但未能删除以空格开头和结尾的行。有什么建议么?
输出
Partner Company <-- unwanted space at beginning of line
Firstname Lastname <-- unwanted space at end of line (not visible)
Street. 152 <-- unwanted space at beginning of line
12345 City
Tel: 01234 567898
Fax: 01234 567899
Mobile: 0123 567899