我想从我拥有的一些 html 页面中获取一些数据,然后将数据存储在数据库中。
HTML 文件有一个博客列表,它们的组织方式如下:
<div class="breadlist"></div>
<h3 class="list"><a href="http://test1.com">Title 1</a></h3>
<p><strong>Description:</strong> Description 1.<br>
<strong>Author:</strong> Author1<br>
<strong>XML:</strong> <a href="http://test1.com/feed">Title 1</a><br>
<strong>Language:</strong> Language1</p>
<h3 class="list"><a href="http://test2.com">Title 2</a></h3>
<p><strong>Description:</strong>Description 2. <br>
<strong>Author:</strong> Author1<br>
<strong>XML:</strong> <a href="http://test2.com/feed">Title 2</a>
<strong>Language:</strong> Español</p>
<div class="breadlist"></div>
在这个例子中,有 2 个博客,但有时有 10 个甚至 100 个博客。每个文件都有不同的数量。我想得到这个数据:
Website Address, Title, Description, Author, Feed, Language.
我试图用PHP Simple HTML DOM Parser来做到这一点,但今天是我第一次尝试,却无处可去。我想我必须循环一些东西,但不知道该怎么做。任何人都知道如何用 PHP 做到这一点?谢谢!
----编辑---- 这是我迄今为止尝试过的:
$str = <<<HTML
<div class="breadlist"></div>
<h3 class="list"><a href="http://test1.com">Title 1</a></h3>
<p><strong>Description:</strong> Description 1.<br>
<strong>Author:</strong> Author1<br>
<strong>XML:</strong> <a href="http://test1.com/feed">Title 1</a><br>
<strong>Language:</strong> Language1</p>
<h3 class="list"><a href="http://test2.com">Title 2</a></h3>
<p><strong>Description:</strong>Description 2. <br>
<strong>Author:</strong> Author1<br>
<strong>XML:</strong> <a href="http://test2.com/feed">Title 2</a>
<strong>Language:</strong> Español</p>
<div class="breadlist"></div>
HTML;
$html = str_get_html($str);
foreach($html->find('h3[class=list]') as $title){
echo "Title: " . $title->innertext . "<br />";
}
foreach($html->find('h3[class=list] a') as $address){
echo "Address: " . $address->href . "<br />";
}
foreach($html->find('p') as $description){
echo "Description: " . $description->childNodes(3)->plaintext . "<br />"; //doesnt work
}
foreach($html->find('p a') as $feed){
echo "Feed: " . $feed->href . "<br />";
}
foreach($html->find('h3[class=list] a') as $language){
echo "Language: " . $language->innertext . "<br />"; // doesnt work
}