0

我正在使用

(?<=Activties</h3>)[\w\s\/\,\-\.]*

从这里提取文本,但我只得到第一行。我想捕捉所有线条,直到下一个“h3 风格”。是否捕获“br”并不重要

  <h3 style="margin: 10px 0px 0px;">Beach Type</h3> sand <h3 style="margin: 10px 0px 0px;">Facilities</h3> Cafes/restaurant<br>Toilets<br>Disabled toilets<br> <h3 style="margin: 10px 0px 0px;">Activities</h3> Swimming<br>Fishing<br>Snorkeling<br> <h3 style="margin: 10px 0px 0px;">Nature and Wildlife</h3> Grandes Rocques is located at the start of Guernsey's 14km west coast footpath and cycle route. Port Soif Nature Trail and the Saumarez Nature trail are also located nearby. There is a diverse range of wildlife here. The first live Green Turtle to be rec <h3 style="margin: 10px 0px 0px;">Parking</h3> 200 spaces are available <h3 style="margin: 10px 0px 0px;">Water Quality</h3> Excellent <h3 style="margin: 10px 0px 0px;">Lifeguard</h3> No <h3 style="margin: 10px 0px 0px;">Cleaning and Litter</h3> The beach is cleaned daily by hand in the summer and twice a week in winter. There are litter and dog bins present. <h3 style="margin: 10px 0px 0px;">Awards and Recommendations</h3> Marine Conservation Society Recommended<br>

任何帮助,将不胜感激。感谢您的关注

欧米茄

4

2 回答 2

0
  1. 用哪种语言?

  2. 通常最好使用 HTML/DOM 解析器从 HTML 中获取数据。我很确定这就是这种情况。

  3. 您的角色类中没有<>括号。为什么要匹配<br>标签?

  4. 你在哪里告诉模式在下一个停止<h3 style

于 2012-10-23T07:06:42.200 回答
0

这是一个相当模糊的问题,但这样的事情完成了你的要求:

(?<=Activities</h3>)(.*?)<h3

You could make the .*? more restrictive if you need to. The .* means match anything, and ? means non-greedy (so it will stop at the first <h3 it finds, not the last one).

于 2012-10-23T07:10:26.223 回答