当我在任何浏览器中打开http://en.wikipedia.org/wiki/Category:Births_by_year时,我会看到包含许多子类别的类别页面和一个子页面http://en.wikipedia.org/wiki/Park_Sung-Baek
但是当我用 Java 阅读同一个页面时,我会得到不同内容的类别页面。而不是上面提到的子页面,它包含http://en.wikipedia.org/wiki/User:Mijotoba/Ruth_Stella_Correa_Palacio
怎么可能?为什么维基百科呈现不同的页面?
设置User-Agent
没有帮助。
请求“正常”内容的标头
GET http://en.wikipedia.org/wiki/Category:Births_by_year HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Accept-Language: ru-RU,zh-CN;q=0.5
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: en.wikipedia.org
请求“修改”内容的标头
GET http://en.wikipedia.org/wiki/Category:Births_by_year HTTP/1.1
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Host: en.wikipedia.org
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive