0

我正在尝试从我的办公室执行爬虫程序。一个非常基本的,在互联网上可用,在我的家用电脑上运行良好。但是,当我尝试在我的办公室 PC 上运行相同的程序时,我遇到了连接超时错误。我认为这是代理问题,并尝试从 Eclipse 内部浏览器访问某些站点,它也运行良好。

 Document doc = Jsoup.connect("http://flipkart.com/").timeout(0).get(); 

请在我的堆栈跟踪下面找到

Exception in thread "main" java.net.ConnectException: Connection timed out: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at sun.net.NetworkClient.doConnect(Unknown Source)
at sun.net.www.http.HttpClient.openServer(Unknown Source)
at sun.net.www.http.HttpClient.openServer(Unknown Source)
at sun.net.www.http.HttpClient.<init>(Unknown Source)
at sun.net.www.http.HttpClient.New(Unknown Source)
at sun.net.www.http.HttpClient.New(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:449)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:434)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:181)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:170)
at org.syntel.crawler.Crawler.processPage(Crawler.java:44)
at org.syntel.crawler.Crawler.main(Crawler.java:20)

我该如何解决这个问题?

4

2 回答 2

0

@alkis 提出了建议:

尝试设置用户代理。如果您使用的是代理,请检查另一个问题: 如何向 Jsoup (HTML parser) 添加代理支持?

于 2015-04-30T20:26:30.003 回答
0

尝试使用:

System.out.println("Testing JSOUP\n--------------");
Proxy proxy = new Proxy(                                      //
        Proxy.Type.HTTP,                                      //
        InetSocketAddress.createUnresolved("www.yourPROXY.com", 80) //
);
Document doc = Jsoup.connect("http://en.wikipedia.org/").proxy(proxy).get();
Elements newsHeadlines = doc.select("#mp-itn b a");
System.out.println(newsHeadlines.html());
于 2017-07-19T05:00:04.583 回答