java - 使用 HTMLUnit 的可能重定向

Question

我正在做一个小程序，可以用谷歌搜索你想要的歌曲并打印它的歌词。为此，我将 HTMLUnit 与 Java 一起使用。我正在搜索目标文本，然后单击第一个 google 结果。但是，当我从浏览器检查结果时，页面会有所不同。

可能我的错误是因为 XPath，但我不确定。因为，我使用了 Google Chrome 的 XPATH 查看器以及 2 个 Firefox 扩展。

在 chrome 中，我右键单击要查看其 XPATH 的元素，然后右键单击底部窗口中的锚点 ()。然后，我选择复制 XPath。然后我将适当的“s”更改为“。

到目前为止，这是我的源代码。我现在写了一首随机的歌。

非常感谢你。

源代码：

（我尝试了很多东西。所以，由于源代码混乱，我很抱歉。我没有删除行来向你展示我到目前为止所做的尝试。再次感谢你。）

import java.io.IOException;
import java.net.MalformedURLException;

import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlPage;


public class dsa {
    public static void main(String args[]) throws FailingHttpStatusCodeException, MalformedURLException, IOException {

        WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6);
        webClient.setThrowExceptionOnScriptError(false);
        //webClient.setJavaScriptEnabled(false);

        String address = "http://www.google.com/search?q=";
        String searchString = "Metallica - Whiskey In The Jar";
        //String searchString = "testtesttest";
        String someString = address.concat(searchString);
        String lastString = someString.concat(" site:randomlyricswebpageblabla.com");

        // site:anotherrandomlyricswebpage.com

        HtmlPage currentPage = webClient.getPage(lastString);
/*
        HtmlTextInput searchBox = (HtmlTextInput) currentPage.getElementById("search_input");
        searchBox.setTextContent("Amorphis - From The Heaven Of My Heart");

        HtmlButtonInput button = (HtmlButtonInput) currentPage.getElementById("search_button");

        HtmlPage newPage = button.click();
*/      
        //System.out.println(currentPage.asText());

        //

        //

        //HtmlElement element = (HtmlElement)currentPage.getByXPath("//h3").get(0);
        //DomNode result = element.getChildNodes().get(0);
        //HtmlAnchor hede = (HtmlAnchor) element.getFirstChild();
        //HtmlPage newPage = hede.click();

        //HtmlElement firstGoogleResult = (HtmlElement) currentPage.getByXPath("//*[@id='rso']/li[1]/div/h3/a").get(0);
        //HtmlAnchor testAnchor = (HtmlAnchor) firstGoogleResult.getFirstChild();

        HtmlAnchor firstGoogleResult = (HtmlAnchor) currentPage.getByXPath("//*[@id='rso']/li[1]/div/h3/a").get(0);

        HtmlPage newPage = firstGoogleResult.click();

        //HtmlAnchor linkTest = (HtmlAnchor) newPage.getByXPath("//*[@id='contentdiv_left']/div/div[3]/text()[1]");



        //HtmlDivision divContent = (HtmlDivision) newPage.getByXPath("\\div[contains(@class, 'contentdiv_leftbox_data')]");
        //System.out.println(divContent.asText());

        //System.out.print("*************\n\n\n" + newPage.asText());
        System.out.println(newPage.asText());
    }
}

我懂了

推文按钮

鸣叫

程序执行后在控制台中。

那么，我的第一个 Google 搜索结果的 XPath 是错误的，还是我在其他地方弄错了？

非常感谢你。

score 0 · Accepted Answer

你得到错误的数据是因为userAgent.

当 google 收到请求时，它会在其数据库中搜索包含此数据的旧搜索：IP + Web 浏览器 + 您的 PC 数据。

我不知道 HTMLUnit 的默认用户代理是什么，但如果你将它设置为与你正在使用的用户代理相同的版本，它应该会得到相同的响应。

另外，我会尝试在适当的歌词网站上搜索，而不是谷歌。我不知道任何美国歌词网站，但应该很容易找到。

希望有帮助！

java - 使用 HTMLUnit 的可能重定向

1 回答 1

Related

Reference