我正在编写我正在编写的网络爬虫的最后一条。
网络爬虫抓取 BBC 新闻,然后将链接以及标题和描述等插入数据库。所有这些都有效,但我有一个包含所有起始 url 的数组,因此只插入以其中任何一个开头的链接。
我正在使用 foreach 循环所有链接数组的所有数组变量,并检查它们是否符合条件,插入新数组,然后将其内爆到字符串,然后插入 mysql 数据库。
但是,关于我的内爆函数会出现错误。我被困住了。
$bbc_values = array('http://www.bbc.co.uk/news/health-', 'http://www.bbc.co.uk/news/politics-', 'http://www.bbc.co.uk/news/uk-', 'http://www.bbc.co.uk/news/technology-', 'http://www.bbc.co.uk/news/world-', 'http://www.bbc.co.uk/news/england-', 'http://www.bbc.co.uk/news/northern_ireland-', 'http://www.bbc.co.uk/news/scotland-', 'http://www.bbc.co.uk/news/wales-', 'http://www.bbc.co.uk/news/business-', 'http://www.bbc.co.uk/news/education-', 'http://www.bbc.co.uk/news/science_and_enviroment-', 'http://www.bbc.co.uk/news/entertainment_and_arts-', 'http://edition.cnn.com/');
foreach ($links as $link) {
$output = array(
"title" => Titles($link), //dont know what Titles is, variable or string?
"description" => getMetas($link),
"keywords" => getKeywords($link),
"link" => $link
);
if (empty($output["description"])) {
$output["description"] = getWord($link);
}
foreach ($output as $new_array) {
if (in_array($new_array['link'], $bbc_values)) {
$news_stories[] = $new_array;
}
}
$data = '"' . implode('" , "', $news_stories) . '"';
$result = mysql_query("INSERT INTO news_story (`title`, `description`, `keywords`, `link`) VALUES (" . $data . ")");