1

我在 Java 中使用 HTMLUnit 连接到远程 URL 并从获得的网页中获取一些信息。

我正在使用以下代码:

final WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_6_0, "companyproxy.server", 8080);
final DefaultCredentialsProvider scp = new DefaultCredentialsProvider();
scp.addProxyCredentials("username", "password","companyproxy.server",8080);
webClient.setCredentialsProvider(scp);

final URL url = new URL("http://htmlunit.sourceforge.net");
final HtmlPage page = (HtmlPage)webClient.getPage(url);
System.out.println(page.asXml());

提供代理服务器的详细信息后,我收到此错误消息:

SEVERE: Credentials cannot be used for NTLM authentication:
org.apache.commons.httpclient.UsernamePasswordCredentials
org.apache.commons.httpclient.auth.InvalidCredentialsException: Credentials cannot be used for NTLM authentication: org.apache.commons.httpclient.UsernamePasswordCredentials
    at org.apache.commons.httpclient.auth.NTLMScheme.authenticate(NTLMScheme.java:332)
    at org.apache.commons.httpclient.HttpMethodDirector.authenticateProxy(HttpMethodDirector.java:320)
    at org.apache.commons.httpclient.HttpMethodDirector.authenticate(HttpMethodDirector.java:232)
    at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346)
    at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:97)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1477)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1435)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:327)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:388)
    at com.test.Test.main(Test.java:25)
Jun 5, 2009 9:28:35 AM org.apache.commons.httpclient.HttpMethodDirector processProxyAuthChallenge
INFO: Failure authenticating with NTLM <any realm>@companyproxy.server:8080
Jun 5, 2009 9:28:35 AM com.gargoylesoftware.htmlunit.WebClient printContentIfNecessary
INFO: statusCode=[407] contentType=[text/html]
Jun 5, 2009 9:28:35 AM com.gargoylesoftware.htmlunit.WebClient printContentIfNecessary
INFO: <HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD>

……

Exception in thread "main" com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException:

407 Proxy Authentication Required for http://htmlunit.sourceforge.net/
    at com.gargoylesoftware.htmlunit.WebClient.throwFailingHttpStatusCodeExceptionIfNecessary(WebClient.java:535)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:332)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:388)
    at com.test.Test.main(Test.java:25)

你能提供一些关于这方面的信息吗?

4

5 回答 5

5

我有同样的问题,并在网上找到了解决方案。忘记setCredentialsProvider()。用这个:

String userAndPassword = username + ":" + password;
String userAndPasswordBase64 = Base64.encodeBase64String(userAndPassword.getBytes());
webClient.addRequestHeader("Proxy-Authorization", "Basic "+userAndPasswordBase64);

这个 Base64 类来自Apache Commons Codec

我使用以下方法传递端口和主机,但可能你的方式也很好。

webClient.getProxyConfig().setProxyHost(proxyHost);
webClient.getProxyConfig().setProxyPort(proxyPort);
于 2012-02-06T16:20:39.267 回答
2

尽管您还没有放入完整的堆栈跟踪,但我猜测错误正在被抛出:

final HtmlPage page = (HtmlPage)webClient.getPage(url);

这是因为 getPage 调用返回的是 UnexpectedPage 而不是 HtmlPage。查看 UnexpectedPage的文档,页面请求返回的内容类型似乎不是 text/html,因此 htmlunit 不确定如何处理它。您应该打开调试并查看实际返回的内容以找出错误。

于 2009-06-04T14:56:32.520 回答
1

我无法使用 HtmlUnit 在代理服务器上进行 NTLM 身份验证。当我使用 HttpClient (HtmlUnit 建立在此之上)并使用 NTLM 身份验证设置代理设置时,它起作用了。这是相同的代码。

HttpClient client = new HttpClient();
client.getHostConfiguration().setProxy("companyproxy.server", 8080);
List authPrefs = new ArrayList();
authPrefs.add(AuthPolicy.NTLM);

client.getState().setProxyCredentials(
    new AuthScope(null, 8080, null),
    new NTCredentials("username", "pwd", "", "DOMAIN"));

client.getParams().setParameter(AuthPolicy.AUTH_SCHEME_PRIORITY, authPrefs);

GetMethod method = new GetMethod(url);

method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, 
        new DefaultHttpMethodRetryHandler(3, false));
于 2009-06-05T14:04:23.133 回答
0

正如 Rob 所说,HtmlUnit 无法检测到它是一个 HTML 页面。

请将样本提供给用户列表,以便我们进一步调查

于 2009-06-05T01:28:24.660 回答
0

使用 HTMLUnit 2.14,这对我有用:

    DefaultCredentialsProvider cp = (DefaultCredentialsProvider) client.getCredentialsProvider();
    cp.addNTLMCredentials(proxyUser, proxyPassword, proxyHost, proxyPort, null, domain);
于 2014-11-24T18:46:03.703 回答