0

I'm trying to get the contact information from this site http://www.internic.net/registrars/registrar-967.html using PHP.. I was able to get the e-email ad by using the href links by doing this:

$contactStr = "http://www.internic.net/registrars/registrar-967.html";
                $contact_string = file_get_contents("$contactStr");
                preg_match_all('/<a href="(.*)">(.*)<\/a>/i', $contact_string, $contactInfo);
                $email = str_replace("mailto:", "", $contactInfo[1][6]); 

However, I'm having a hard time getting the address and the phone # since there's no html element I can use like < p > maybe.. I just need 1800 SW First Ave., Suite 440 Portland OR 97201 United States and 310-467-2549 from this site.. Please enlighten me on how to do this using preg_match_all or some other ways possible.. Thanks!

4

1 回答 1

0

正如其他人在评论中所说,不要使用正则表达式尝试DOMDocument 。

这是一个例子(有点hacky tho)希望它有所帮助:

function get_register_by_id($id){
    $site = file_get_contents('http://www.internic.net/registrars/registrar-'.$id.'.html');
    $dom = new DOMDocument();
    @$dom->loadHTML($site);
    $result = array();
    foreach($dom->getElementsByTagName('td') as $td) {
        if($td->getAttribute('width')=='420'){
            $innerHTML= '';
            $children = $td->childNodes;
            foreach ($children as $child) {
                $innerHTML .= trim($child->ownerDocument->saveXML($child));
            }
            $fixed = array_map('strip_tags', array_map('trim', explode("<br/>",trim($innerHTML))));
            foreach($fixed as $val){
                if(empty($val)){continue;}

                $result[] = str_replace(array('! '),'',$val);
            }
        }
    }
    return $result;
}


print_r(get_register_by_id(965));
/*Array
(
    [0] => Domain Central Australia Pty Ltd.
    [1] => Level 27
    [2] => 101 Collins Street
    [3] => Melbourne Victoria 3000
    [4] => Australia
    [5] => +64 300 4192
    [6] => robert.rolls@domaincentral.com.au
)*/
print_r(get_register_by_id(966));
/*
Array
(
    [0] => Web Business, LLC
    [1] => PO Box 1417
    [2] => Golden CO 80402
    [3] => United States
    [4] => +1.303.524.3469
    [5] => support@webbusiness.biz
)*/

print_r(get_register_by_id(967));
/*
Array
(
    [0] => #1 Host Australia, Inc.
    [1] => 1800 SW First Ave., Suite 440
    [2] => Portland OR 97201
    [3] => United States
    [4] => 310-467-2549
    [5] => registry-operations@moniker.com
)*/
于 2013-01-15T02:15:08.980 回答