0

我有一个获取正确网址的功能:example.com 到http://example.com, www.example.org 到https://example.org等。

function startsWith($haystack, $needle) {
    return !strncmp($haystack, $needle, strlen($needle));
}

function properUrl($url) {
    $urls = array();
    if (startsWith($url, "https://") || startsWith($url, "http://")) {
        $urls[] = $url;
    } else if (startsWith($url, "www.")) {
        $url = substr($url, 4);
        $urls[] = "http://$url";
        $urls[] = "http://www.$url";
        $urls[] = "https://$url";
        $urls[] = "https://www.$url";
    } else {
        $urls[] = "http://$url";
        $urls[] = "http://www.$url";
        $urls[] = "https://$url";
        $urls[] = "https://www.$url";
    }
    foreach ($urls as $u) {         
        if (@file_get_contents($u)) {
            $url = $u;
            break;
        }
    }
    return $url;
}

什么是更快的算法而不是 file_get_contents。我想要一个正确的网址,而不是阅读整个页面。谢谢。

4

1 回答 1

1

使用 php 的parse_url() http://php.net/manual/en/function.parse-url.php

例子:

<?php
$url = '//www.example.com/path?googleguy=googley';

// Prior to 5.4.7 this would show the path as "//www.example.com/path"
var_dump(parse_url($url));
?>

会给你:

array(3) {
  ["host"]=>
  string(15) "www.example.com"
  ["path"]=>
  string(5) "/path"
  ["query"]=>
  string(17) "googleguy=googley"
}

尽管:

<?php
$url = 'http://username:password@hostname/path?arg=value#anchor';

print_r(parse_url($url));

echo parse_url($url, PHP_URL_PATH);
?>

会给你:

Array
(
    [scheme] => http
    [host] => hostname
    [user] => username
    [pass] => password
    [path] => /path
    [query] => arg=value
    [fragment] => anchor
)

正如您所看到的,只需检查数组的索引以获取您需要的值并从那里构建其余的 url,这非常容易。节省了很多字符串比较的东西..

要检查 url 是否存在,您应该只检查标题而不是获取整个文件(这很慢)。PHPget_headers()会为你做到这一点:

$file = 'http://www.domain.com/somefile.jpg';
$file_headers = @get_headers($file);
if($file_headers[0] == 'HTTP/1.1 404 Not Found') {
    $exists = false;
} else {
    $exists = true;
}

祝你好运!

于 2013-06-08T19:23:24.327 回答