0

我相信页面的标记是我遇到的问题的一部分,所以我认为我需要发布源代码和 JSFiddle JSFiddle和原始 GIS 页面

我正在尝试从底部的表格中获取诸如名称:和地址:之类的信息。

尝试解决方案:

我写了下面的代码,希望能看到所有的表数据,但我要从中获取数据的表什么也没返回。

 <?php
 $k=0;
 $num=1000;
 var_dump(libxml_use_internal_errors(true));
 $domOb = new DOMDocument();
 $html = @$domOb->loadHTMLFile('http://www.gis.catawba.nc.us/website/Parcel/parcel_main.asp?Cmd=query&key=372215634301&type=P');
 $domOb->preserveWhiteSpace = false; 
 $items = $domOb->getElementsByTagName('td'); 
 while ($k<(int)$num){
 echo $items->item($k++)->nodeValue.'<br>'; 
 };
 ?>

所有返回的是:

bool(false) Real Estate Search - Legacy Map Layers visible FAQ's Help GIS Home

所以我希望有人能告诉我我做错了什么错过了我正在寻找的所有数据?我怎样才能尽可能容易/简单地只提取名称和地址?

使用 Xpath 也尝试了以下操作,但收到很多警告...

 $dom = new DOMDocument;
 $dom->load('http://www.gis.catawba.nc.us/website/Parcel/parcel_main.asp?Cmd=query&key=372215634301&type=P');
 $s = simplexml_import_dom($dom);

 echo $name = $s->xpath('//table[@class="words13]/td[contains(text(), "Name:")]');
 echo $add = $s->xpath('//table[@class="words13]/td[contains(text(), Address:)]');

使用user2518542的代码并结合hakre代码我得到以下

 $ch = curl_init();  
 curl_setopt($ch, CURLOPT_URL,"http://www.gis.catawba.nc.us/website/Parcel/parcel_main.asp?Cmd=QUERY&key=372215634301&type=P&width=1280&height=923");
 curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout after 30 seconds
 curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
 $result=curl_exec ($ch);
 curl_close ($ch);
 $doc->loadHTML($result);

 $tds = $doc->getElementsByTagname('td');
 foreach($tds as $td) {
 printf(" * %s\n", $td->textContent);
 echo '<br>';
 }

以下成功打印出所有标签。

4

3 回答 3

2

您要查找的表格单元格不是该 HTML 文档的一部分。您首先需要了解网络抓取的基础知识,我建议您借一些有关该主题的书籍并阅读它们。

图书馆的时间;)


如果表格单元格在文档中(它似乎有所不同,有时它们是,有时它们不是),原始示例显示了它,这也演示了如何迭代DOMNodeList

$doc = new DOMDocument();

libxml_use_internal_errors(true);
$doc->loadHTMLFile('Catawba County Legacy Map Server.html');

$tds = $doc->getElementsByTagname('td');
foreach($tds as $td) {
    printf(" * %s\n", $td->textContent);
}

示例输出:

php "test.php" (in directory: /home/hakre/php/test)
 *
 * Real Estate Search - Legacy
 *
 *
 *
 *
 *
 *
 *
 *
 *
 * Map Layers
 * visible
 *
 *
 * Parcels
 *
 * Parcel Annotation
 *
 * Address Points
 *
 * Misc. Lines
 *
 * Structures
 *
 * Contour Lines
 *
 * Soils
 *
 * Townships
 *
 * Water Features
 *
 * Tiles
 *
 * Flood Zone
 *
 * Agricultural District
 *
 * Aerial 2009
 *
 * Aerial 2005
 *
 * Aerial 2002
 *
 * Cities
 *
 * Print the Map  
 * Print Map and Parcel Report  
 * Print the Parcel Report  
 * Assessment Report  
 * List all Owners  
 * Deed History Report
 * Parcel Information:
 * Owner Information:
 * Parcel ID: 372215634301
 * Name: PENLEY TREASURE B
 * Parcel Address: 3152 7TH AV SE 
 * Name2:  
 * City: CONOVER 28613
 * Address: 5508 SWINGING BRIDGE RD
 * LRK(REID): 57186
 * Address2:  
 * Deed Book/Page: 1906/0741 Deed Image
 * City: CONOVER
 * Subdivision: FOREST HGTS
 * State/Zip: NC 28613-7415
 * Lots: 1-4
 *
 * Block: C
 *
 * Last Sale:
 * School Information:
 * Plat Book/Page: 8/119 Plat Image
 * School District: COUNTY
 * Calculated Acreage: 0.31
 * Elementary School: WEBB A MURRAY
 * Tax Map: 167H  04006A
 * Middle School: ARNDT
 * State Road:  
 * High School: ST STEPHENS
 * Township: HICKORY
 * School Map
 *  
 *  
 * Tax/Value Information:  Tax Rates(pdf)
 * Zoning Information:
 * Municipal Tax District:  
 * Zoning District: HICKORY
 * Fire District: HICKORY RURAL
 * Zoning1: OI
 * Tax Account Number:  
 * Zoning2:  
 * Market Building(s) Value: $55,400
 * Zoning3:  
 * Market Land Value: $20,300
 * Zoning Overlay:  
 * Market Total Value: $75,700
 * Small Area:  
 * Year Built/Remodeled: 1959  
 * Split Zoning District 1/2: 0/0
 * Current Tax Bill
 * Zoning Agency Phone Numbers
 * Miscellaneous:
 *  
 * Voter Precinct:P35
 * Firm Panel Date: 9/5/2007
 * Building Permits for this parcel
 * Firm Panel #: 3710372200J
 * WaterShed:  
 * 2010 Census Tract: 011000
 * WaterShed Split:  
 * 2010 Census Block: 3035
 * Parcel Report Data Descriptions
 * Agricultural District:  
 * FAQ's
 * Help
 * GIS Home
Compilation finished successfully.
于 2013-06-25T04:28:35.643 回答
1

使用XPath查找//table[@class="words13]/td[contains(text(), 'Name:')]//table[@class="words13]/td[contains(text(), 'Address:')]

于 2013-06-25T04:00:16.970 回答
1

试试这个

$ch = curl_init();  
curl_setopt($ch, CURLOPT_URL,"http://www.gis.catawba.nc.us/website/Parcel/parcel_main.asp?    Cmd=QUERY&key=372215634301&type=P&width=1280&height=923");
curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout after 30 seconds
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
$result=curl_exec ($ch);
curl_close ($ch);
echo $result;exit;

您将获得整页源代码,然后您可以通过 pregreplace 简单地获得所需的水。

于 2013-06-25T04:00:21.297 回答