我正在使用 htmlunit 开发一个网络爬虫,我已经添加了所有必需的超时,但是我注意到当我使用 Java VisualVM 进行线程转储时,当某个网站的服务器被爬网时应用程序挂起:
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:88)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:429)
at java.net.Socket.connect(Socket.java:525)
at com.gargoylesoftware.htmlunit.SocksSocketFactory.connectSocket(SocksSocketFactory.java:89)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:573)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:776)
at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:152)
at app.plugin.core.net.QHttpWebConnection.getResponse(QHttpWebConnection.java:30)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1439)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1358)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:307)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:373)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:358)
这真的很令人沮丧,因为我无法控制这些服务器。这个问题严重影响了我的应用程序的性能。
问题:
- 我该如何解决这个问题?
- 有没有办法获取Java应用程序打开的套接字连接列表并使用它来终止套接字,例如模拟服务器关闭连接?