php - RegEx matches all text that applies to only some of the query?

Question

I'm working on a html parser for a client, and I have just started messing around with RegEx. I'm quite new to it but am learning quickly! In this part, I need to acquire all of the text that is 18.0pt size within the document. Here is the first RegEx I have tried (using a real-time RegEx tester):

<p.*?><span.*?style='.*?font-size:1

Here is my test text:

<p class=MsoNormal><span style='font-size:14.0pt;font-family:"Comic Sans MS"'>3<sup>rd</sup>
Sunday in Lent - 2013c<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:14.0pt;font-family:"Comic Sans MS"'>Old
Testament – Isaiah 55:1-9<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:14.0pt;font-family:"Comic Sans MS"'>New
Testament – Luke 13:1-9<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:18.0pt;font-family:"Comic Sans MS"'><o:p>&nbsp;</o:p>
</span></p>

It works correctly and highlights each line separately until the 1. The problem is, right when I change 1 to 18, instead of highlighting just the line with font-size:18, it highlights ALL THE WAY from the first line until the 18. I would like to just grab the line with 18pt font. Thank you, and any help is appreciated! :)

score 2 · Accepted Answer

Here's a better regexp:

<p[^>]*>[ \t\r\n]*<span[^>]* style='[^']*font-size:18

Your one is doing exactly as you told it; finding <p, then any number of arbitrary characters, then ><span, then more arbitrary characters, then font-size:18. So it finds the first <p then all the arbitrary characters until font-size:18. You were just lucky in the first example that all your spans had font-size specified.

This version doesn't allow so much; stopping at any >. Also to make it more robust, I allowed whitespace between the <p> and <span>.

score 0 · Accepted Answer

如果您匹配“除换行符以外的任何字符”，而不是匹配“任何字符”（带点），您将确保不要超出行尾：

<p.*?><span[^\n]*?style='[^\n]*?font-size:18

现在通常.不匹配换行符，除非设置了某些标志（这取决于您的环境） - 特别是s标志。这可能是您的正则表达式测试器的默认设置吗？

另一个想法是限制您希望与 {} 匹配的字符数 - 例如

<p.{,20}>

只要您的开始<p>标签中的字符不超过 20 个，这将起作用。

php - RegEx matches all text that applies to only some of the query?

2 回答 2

Related

Reference