我需要 html 解析方面的帮助。在这里发布问题之前,我试图找到这个答案,但找不到。我已将博客页面的完整 html 存储在数据库表中。现在我想从那个 html 中提取文本和图像。但我必须从整个 html 中仅提取特定段落的文本和图像。
请参见下面的示例,其中有很多代码标签。它有三个段落。我必须仅从与我的要求相关的第 2 段中提取文本和图像。(我有关键字,我可以搜索该关键字,这样我就可以确定我需要提取这一段。)
如何从任何博客中提取特定段落文本和图像。我有关键字要在 html 中搜索,即关键字 = PRODUCT ABC。我正在使用 php。
<html>
<!-- Javascript: tag come here --->
<!-- Head: tag come here --->
<!-- Meta: tag come here --->
<!-- Title: tag come here --->
<!-- Links: tag come here --->
<!-- Javascript: tag come here --->
<body>
<!-- Lot of other code come here about links, javascript, headings etc -->
<!-- DIV: tag come here --->
<p> "PARAGRAPH 1, This paragraph contain only some text." </p>
<!-- Script: tag come here --->
<p> PARAGRAPH 2, It has some information about PRODUCT ABC...</p>
<img /> <!-- some images come here related to this paragraph.-->
<img /> <!-- some images come here related to this paragraph.-->
<img /> <!-- some images come here related to this paragraph.-->
<!-- Script: tag come here --->
<p> PARAGRAPH 3, This paragraph contain only some text. </p>
<img /> <!-- some images come here related to this paragraph.-->
<!-- Links: tag come here --->
<!-- Javascript: tag come here --->
</body>
</head>
</html>