php - 为什么 get_headers() 返回 400 Bad request，而 CLI curl 返回 200 OK？

Question

我正在尝试使用本机get_headers()函数获取 HTTP 标头：

$headers = get_headers('https://www.grammarly.com')

结果是

HTTP/1.1 400 Bad Request
Date: Fri, 27 Apr 2018 12:32:34 GMT
Content-Type: text/plain; charset=UTF-8
Content-Length: 52
Connection: close

但是，如果我对curl命令行工具做同样的事情，结果会有所不同：

curl -sI https://www.grammarly.com/

HTTP/1.1 200 OK
Date: Fri, 27 Apr 2018 12:54:47 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 25130
Connection: keep-alive

造成这种反应差异的原因是什么？它是在 Grammarly 的服务器端还是其他一些实施不善的安全功能？

score 4 · Accepted Answer

这是因为get_headers()使用了默认的流上下文，这基本上意味着几乎没有 HTTP 标头被发送到 URL，这是大多数远程服务器都会大惊小怪的。通常最有可能导致问题的缺失标头是 User-Agent。get_headers()您可以在使用调用之前手动设置它stream_context_set_default。这是一个对我有用的例子：

$headers = get_headers('https://www.grammarly.com');

print_r($headers);

// has [0] => HTTP/1.1 400 Bad Request

stream_context_set_default(
    array(
        'http' => array(
            'user_agent'=>"php/testing"
        ),
    )
);

$headers = get_headers('https://www.grammarly.com');

print_r($headers);

// has [0] => HTTP/1.1 200 OK

score 0 · Accepted Answer

只需使用 php curl 函数即可：

function getMyHeaders($url)
{
    $options = array(
        CURLOPT_RETURNTRANSFER => true,    
        CURLOPT_HEADER         => true,    
        CURLOPT_FOLLOWLOCATION => true,    
        CURLOPT_USERAGENT      => "spider",
        CURLOPT_AUTOREFERER    => true,
        CURLOPT_SSL_VERIFYPEER => false,
        CURLOPT_NOBODY => true
    );
    $ch = curl_init($url);
    curl_setopt_array($ch, $options);
    $content = curl_exec($ch);
    curl_close($ch);
    return $content;
}
print_r(getMyHeaders('https://www.grammarly.com'));

php - 为什么 get_headers() 返回 400 Bad request，而 CLI curl 返回 200 OK？

2 回答 2

Related

Reference