java - 从需要登录的页面获取信息 (Java)

Question

我正在制作一个小脚本，它应该从一个页面中获取大约 300 个链接并将它们制作成快捷方式（全部保存在一个文件夹中）。

我可以从某些页面获得我需要的所有链接，但有些是需要我先登录的网站的一部分。

我尝试了 HttpUnit，但我每次都失败了。到目前为止，我只是将 Html 页面放入 inputStream 并从那里读取（逐行查找我需要的内容），但我不知道如何连接到网站或在进入登录部分后做任何其他事情。

这是 HttpUnit 代码，如果它可以帮助任何人：

final WebClient webClient = new WebClient();

// Get the first page
final HtmlPage page1 = webClient.getPage("mywebsite");

ArrayList<HtmlForm> f;
f = (ArrayList<HtmlForm>) page1.getForms();

System.out.println(f);

// Get the form that we are dealing with and within that form, 
// find the submit button and the field that we want to change.
final HtmlForm form = page1.getFirstByXPath("//form[@id='login']");

final HtmlSubmitInput button = form.getFirstByXPath("//input[@value='Login']");
final HtmlTextInput username = form.getFirstByXPath("//input[@id='username']");

// Change the value of the text field
username.setValueAttribute("username");

final HtmlPasswordInput passField = form.getFirstByXPath("//input[@id='password']");

// Change the value of the text field
passField.setValueAttribute("pass");

// Now submit the form by clicking the button and get back the second page.
final HtmlPage page2 = button.click();

webClient.closeAllWindows();

请原谅我不好的变量命名：p 这是一个只为我自己编写的脚本，所以我并没有真正打扰。

我在“final HtmlPage page2 = button.click();”上得到一个 NullPointerException

提前致谢。

score 0 · Accepted Answer

您对按钮的搜索似乎失败了。在这条线之后

final HtmlSubmitInput button = form.getFirstByXPath("//submit[@value='Login']");

我会补充

assert(button != null) : "Could not find the button";

并使用断言运行您的应用程序（-eaJVM 的参数），它会在那里报告断言失败。

java - 从需要登录的页面获取信息 (Java)

1 回答 1

Related

Reference