php - 在 PHP 中寻找正则表达式

Question

我在 PHP 中使用 preg_match 函数来从 RSS 提要中提取一些值。在这个提要内容里面有这样的东西：

<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>

我需要获取那些“带有非字母数字字符的文本”和“更多带有非字母数字字符的文本”以将它们保存在数据库中。我不知道使用正则表达式是否是最好的方法。

非常感谢。

score 1 · Accepted Answer

如果您想使用正则表达式（即快速而肮脏，不太容易维护），这将为您提供文本：

$input = '<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>';

// Match between tags
preg_match("#</strong>(.*?)</li>#", $input, $matches);
// Remove the text inside brackets
echo trim(preg_replace("#\s*\(.*?\)\s*#", '', $matches[1]));

但是，嵌套括号可能会失败。

score 0 · Accepted Answer

鉴于结构始终相同，您可以使用此正则表达式

</strong>([^,]*),([^<]*)</li>

第 1 组将有第一个片段，第 2 组将有另一个

一旦您开始使用正则表达式解析 html/xml，很快就会发现一个成熟的解析器更适合。对于小型或一次性解决方案，您可以使用正则表达式。

score 0 · Accepted Answer

$str = '<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>';
$str = preg_replace('~^.*?</strong>~', '', $str); // Remove leading markup
$str = preg_replace('~</li>$~', '', $str); // Remove trailing markup
$str = preg_replace('~\([^)]++\)~', '', $str); // Remove text within parentheses
$str = trim($str); // Clean up whitespace
$arr = preg_split('~\s*,\s*~', $str); // Split on the comma

php - 在 PHP 中寻找正则表达式

3 回答 3

Related

Reference