1

我做了一个简单的函数来验证通过 textarea 提交的 URL(每行 1 个链接):

function validate_urls($value)
{
    //final array of links
    $links = array();

    $value = array_map(function($a) use (&$links){
        $a = trim($a);
        if(strlen($a) !== 0 and (strpos($a, 'http') !== 0 or strpos($a, 'https') !== 0)){
            $a = 'http://'.$a;
        }
        $url = parse_url($a,PHP_URL_HOST);
        if($url != null and !in_array($a, $links) and filter_var($a, FILTER_VALIDATE_URL) !== false and checkdnsrr($a)){
            $links[] = $a;
        }
        return false;
    }, explode("\n",$value));
    return $links;
}

var_dump(validate_urls($_POST['links']);

这样做是检查是否

  • 网址有效
  • 该 URL 处于活动状态
  • URL 不是重复的

问题是,为什么它不起作用(返回一个空数组)?我已经检查了每一项检查,它应该可以工作,但它没有。对不起,如果代码很乱,我还在努力学习。

4

1 回答 1

1
if(strlen($a) !== 0 and (strpos($a, 'http') !== 0 or strpos($a, 'https') !== 0)){

A and B or C不会转换(A and B) or (A and C)为 AND 具有比 OR 更高的优先级。所以你想把它改成A and (B or C).


FILTER_VALIDATE_URL上的文档状态»请注意,该函数只会找到有效的 ASCII URL;«。所以这是一个非常严格的选择。它遵循已被RFC 3986取代的RFC 2396中给出的 URL 规范。

在没有更彻底地研究这个过滤器的情况下,这两条信息(对我来说)足以将该过滤器标记为完全无用。


checkdnsrr($a)

正在测试整个 URL 而不仅仅是主机。即使您要检查主机,您也会寻找 MX 记录(即,如果该主机可通过邮件访问)。A将检查该主机是否具有 IP 集,CNAME将检查该主机是否是另一个 DNS 记录的别名,...。您可能正在寻找NS哪个会检查该主机是否有任何 DNS 记录。

因此,如果您将检查更改为checkdnsrr($url, "NS")您将验证该 URL 的主机组件是否确实为 DNS 所知。您没有检查该主机是否实际上正在侦听指定的端口。而且您没有检查给定资源(例如/foo/bar.html)是否存在。


If you wanted to make sure an URL actually points to something useful, you'd have to make a HEAD request and check the response. You can do that easily with curl. If curl is not available, you could implement a simple HTTP client yourself using fsockopen() - with the disadvantage of not being able to speak HTTP (HTTP through SSL) and having to implement redirection following and similar stuff yourself. Short: you don't want to go down that road.

That said, there is also a performance problem up ahead. The HTTP requests are done synchronously. Should a host be failing to reply in an acceptable time frame, your script might time out - or at least take ages to finish, depending on the number of URLs you're checking and the quality of service behind them.

于 2012-07-06T11:11:14.977 回答