2

I am getting separate all url links from HTMl contents with this code

$doc = new DOMDocument();
$doc->loadHTML($string);
$anchorTags = $doc->getElementsByTagName('a');
$links = array();
foreach ($anchorTags as $url) {
    $source = parse_url($url->getAttribute('href'));
    $source = preg_replace('/^www\./', '', $source['host']);
    $links[$source][$url->getAttribute('href')] = $url->nodeValue;
}

Output with this above code.

Array
(
    [Facebook] => Array
                (
                    [facebook.com] => https://www.facebook.com/
                )

    [Google] => Array
                (
                    [google.com] => https://www.google.com/
                )

    [] => Array
        (
            [] =>
         )

    [yahoo] => Array
            (
                [yahoo.com] => https://www.yahoo.com/
            )

)

I just want to get remove the null/blank elements/index/key from array for this i am using array_filter();

But not getting solution.

print_r(array_filter($links));
4

4 回答 4

2

Just add the condition for checking values :

$links = array();
foreach ($anchorTags as $url) {
    $source = parse_url($url->getAttribute('href'));
    $source = preg_replace('/^www\./', '', $source['host']);
    if($source != null && $source != "" && $url->nodeValue != null && $url->nodeValue != ""){
         $links[$source][$url->getAttribute('href')] = $url->nodeValue;
    }
}
于 2013-08-22T07:34:47.710 回答
1

Or, a bit more elegant, don't even push results to your array if it's empty:

if ($source != "") $links[$source][$url->getAttribute('href')] = $url->nodeValue;
于 2013-08-22T07:33:21.300 回答
1

You can try this,

    // Remove empty elements
foreach($links as $key => $val){
    if($val == '')
    {
        unset($val);
    }
}
于 2013-08-22T07:37:25.880 回答
0

you can check on strlen, like this

print_r(array_filter($links, 'strlen' ));

于 2013-08-22T07:33:25.353 回答