2

我正在尝试将网站中的数据保存到 mysql 数据库中。我能够保存我想保存的大部分内容,但我有一个特殊的问题。我提取的链接正在保存,但我希望链接与其他属性在同一行中。下面是我的 CURL 和 mysql 查询,用于提取信息并将其保存到数据库中。

$target_url = "http://www.ucc.ie/modules/descriptions/BM.html";
$codeS = "BM";
$html = file_get_contents("http://www.ucc.ie/modules/descriptions/BM.html"); 
@$doc = new DomDocument(); 
@$doc->loadHtml($html); 
//discard white space   
@$doc->preserveWhiteSpace = false; 
$xpath = new DomXPath($doc);

//Read through dd tags
$options = $doc->getElementsByTagName('dd');

//Go into dd tags and look for all the links with class modnav 
$links = $xpath->query('//dd //a[@class = "modnav"]'); 

//Loop through and display the results for links
foreach($links as $link){    
echo $link->getAttribute('href'), '<br><br>';
}   

foreach ($options as $option) { 

    $option->nodeValue;
    echo "Node Value (Module name/title)= $option->nodeValue <br /><br /> <br />"; 

      // save both for each results into database
$query3 = sprintf("INSERT INTO all_modulenames(code,module_name,description_link,gathered_from) 
     VALUES ('%s','%s','%s','%s')",
     mysql_real_escape_string ($codeS),
     mysql_real_escape_string($option->nodeValue),
     mysql_real_escape_string($link->getAttribute('href')),
     mysql_real_escape_string($target_url));
     mysql_query($query3) or die(mysql_error()."<br />".$query3); 

    } 
    echo "<br /> <br /> <br />";


Here is the table
-- ----------------------------
-- Table structure for `all_modulenames`
-- ----------------------------
DROP TABLE IF EXISTS `all_modulenames`;
CREATE TABLE `all_modulenames_copy` (
`code` varchar(255) NOT NULL,
`module_name` varchar(255) NOT NULL,
`description_link` varchar(255) NOT NULL,
`gathered_from` varchar(255) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

-- ----------------------------
-- Records of all_modulenames
-- ----------------------------

所以问题是“$link->getAttribute('href')”与我试图保存的其他内容分开保存。链接首先保存,然后是其余数据,使一些行为空,但我试图一次保存所有内容,即填充每一行,然后它们移动到第二行,直到 for each 语句完成。请问我该怎么做?任何帮助,将不胜感激 !!

4

1 回答 1

1

未经测试(因此需要调试),但我会采用这样的方法:

...etc
@$doc->preserveWhiteSpace = false;  

//Read through dd tags 
$options = $doc->getElementsByTagName('dd'); 

foreach ($options as $option) {  

    // Get the links and find the one with the right class
    $href = '';
    $links = $option->getElementsByTagName('a');
    foreach ($link as $link) {
        if ($link->hasAttribute('class') && $link->hasAttribute('href')) {
            $aClasses = explode(' ', $link->getAttribute('class'));
            if (in_array('modnav', $aClasses)) {
                  $href=$link->getAttribute('href');
            }
        }
    }

    Insert in to SQL etc, $href is the link text belonging to the dd ...
于 2012-06-25T02:36:05.237 回答