0

我正在构建一个网络爬虫。它找到页面上的所有链接及其标题和元描述等。它做得很好。然后我写了一个数组,它给出了我想要的链接的所有起始 url。因此,如果它抓取一个链接并且它的 url 以数组中给出起始 url 的任何值开头,则插入到 $news_stories 中。

唯一的问题是它似乎没有插入它们。页面返回空白,现在它说 array_intersect 语句需要一个数组,而我没有指定我拥有的数组。

总之,我很难理解我的代码在哪里不起作用以及为什么没有插入想要的网址。

$bbc_values = array(
    'http://www.bbc.co.uk/news/health-', 
    'http://www.bbc.co.uk/news/politics-', 
    'http://www.bbc.co.uk/news/uk-', 
    'http://www.bbc.co.uk/news/technology-',  
    'http://www.bbc.co.uk/news/england-', 
    'http://www.bbc.co.uk/news/northern_ireland-', 
    'http://www.bbc.co.uk/news/scotland-', 
    'http://www.bbc.co.uk/news/wales-', 
    'http://www.bbc.co.uk/news/business-', 
    'http://www.bbc.co.uk/news/education-', 
    'http://www.bbc.co.uk/news/science_and_enviroment-',         
    'http://www.bbc.co.uk/news/entertainment_and_arts-', 
    'http://edition.cnn.com/'
);

// BBC Algorithm
foreach ($links as $link) {
    $output = array(
        "title"       => Titles($link), //dont know what Titles is, variable or string?
        "description" => getMetas($link),
        "keywords" => getKeywords($link), 
        "link"        => $link                 
    );

    if (empty($output["description"])) {
        $output["description"] = getWord($link);
    }
}

$new_stories = array();

foreach ($output as $new_array) {
    if (array_intersect($output['link'], $bbc_values) == true) {
        $news_stories[] = $new_array;
    }

    print_r($news_stories);
}
4

3 回答 3

0

Hmm i don't think array_intersect is what you need for a comparison http://php.net/manual/en/function.array-intersect.php

Maybe you want to look for in_array http://php.net/manual/en/function.in-array.php

于 2012-12-20T13:54:50.780 回答
0

您将数组标记为 $new_stories 并打印 $news_stories..... diff is 'S'

检查代码是否进入这个循环,我认为不是......

if (array_intersect($output['link'], $bbc_values) == true) {
    echo 'here';
}
于 2012-12-20T13:52:15.963 回答
0

当使用返回参数时,此函数使用内部输出缓冲,因此不能在 ob_start() 回调函数中使用

于 2018-03-03T13:46:51.003 回答