我正在尝试使用 Html-Unit get 方法获取以下 URL 的页面源。
它卡在某个地方。我试图找出原因,但我没有得到它。我还尝试查看由 HtmlUnit 创建的线程是否被 BLOCKED ar WAITING,但事实并非如此。
以下是我的 HTML Unit 生成的日志。
18 Jan 2013 04:14:47,832 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js] line=[16] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:47,924 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_getElementById(HTMLDocument.java:1049) - getElementById(script1358500487923) did a getElementByName for Internet Explorer
18 Jan 2013 04:14:49,498 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://code.jquery.com/jquery-latest.js] line=[911] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:49,565 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_getElementById(HTMLDocument.java:1049) - getElementById(sizzle-1358500489525) did a getElementByName for Internet Explorer
18 Jan 2013 04:14:53,047 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject.jsConstructor(ActiveXObject.java:128) - Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.
18 Jan 2013 04:14:53,048 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.] sourceName=[http://www.google-analytics.com/ga.js] line=[18] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:53,060 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject.jsConstructor(ActiveXObject.java:128) - Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.
18 Jan 2013 04:14:53,061 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.] sourceName=[http://www.google-analytics.com/ga.js] line=[18] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:53,061 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject.jsConstructor(ActiveXObject.java:128) - Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.
18 Jan 2013 04:14:53,062 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.google-analytics.com/ga.js] line=[18] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:53,829 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://chat.livechatinc.net/licence/1051689/script.cgi?lang=en&groups=0] line=[60] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:54,878 - main - ERROR - com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter.runtimeError(StrictErrorReporter.java:79) - runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://platform.twitter.com/widgets.js] line=[5] lineSource=[null] lineOffset=[0]
18 Jan 2013 04:14:56,215 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_getElementById(HTMLDocument.java:1049) - getElementById(sizzle-1358500496196) did a getElementByName for Internet Explorer
18 Jan 2013 04:14:56,458 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_execCommand(HTMLDocument.java:1590) - Nothing done for execCommand(BackgroundImageCache, ...) (feature not implemented)
18 Jan 2013 04:14:58,086 - main - WARN - com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument.jsxFunction_getElementById(HTMLDocument.java:1049) - getElementById(sizzle-1358500489525) did a getElementByName for Internet Explorer
以下是我创建的进程的线程转储(使用 jstack)
2013-01-18 04:17:46
Full thread dump Java HotSpot(TM) 64-Bit Server VM (22.1-b02 mixed mode):
"Attach Listener" daemon prio=10 tid=0x0000000002955000 nid=0x16dd waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Service Thread" daemon prio=10 tid=0x00007feca00cc800 nid=0x154f runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x00007feca00ca000 nid=0x154e waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x00007feca00c7000 nid=0x154d waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00007feca00c5000 nid=0x154c runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00007feca007c800 nid=0x154b in Object.wait() [0x00007fec9fffe000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000c2369e20> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x00000000c2369e20> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)
"Reference Handler" daemon prio=10 tid=0x00007feca007a000 nid=0x154a in Object.wait() [0x00007feca4157000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000c23699e0> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x00000000c23699e0> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x00000000025d9000 nid=0x1546 runnable [0x00007fecaa8b6000]
java.lang.Thread.State: RUNNABLE
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.getTopLevelScope(ScriptableObject.java:2007)
at com.gargoylesoftware.htmlunit.javascript.SimpleScriptable.getWindow(SimpleScriptable.java:303)
at com.gargoylesoftware.htmlunit.javascript.SimpleScriptable.getWindow(SimpleScriptable.java:293)
at com.gargoylesoftware.htmlunit.javascript.SimpleScriptable.getPrototype(SimpleScriptable.java:251)
at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLCollection.<init>(HTMLCollection.java:99)
at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLCollection.<init>(HTMLCollection.java:110)
at com.gargoylesoftware.htmlunit.javascript.host.HTMLCollectionFrames.<init>(Window.java:1751)
at com.gargoylesoftware.htmlunit.javascript.host.Window.getFrames(Window.java:759)
at com.gargoylesoftware.htmlunit.javascript.host.Window.jsxGet_length(Window.java:749)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:172)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$GetterSlot.getValue(ScriptableObject.java:342)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.getImpl(ScriptableObject.java:2523)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.get(ScriptableObject.java:438)
at com.gargoylesoftware.htmlunit.javascript.SimpleScriptable.get(SimpleScriptable.java:75)
at com.gargoylesoftware.htmlunit.javascript.host.Window.get(Window.java:1226)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.getProperty(ScriptableObject.java:2088)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getObjectProp(ScriptRuntime.java:1527)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getObjectProp(ScriptRuntime.java:1513)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1398)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:854)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:164)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:429)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:267)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3183)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:162)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:538)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:589)
- locked <0x00000000c274d308> (a com.gargoylesoftware.htmlunit.html.HtmlPage)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:545)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:520)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:896)
at com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeEventListeners(EventListenersContainer.java:162)
at com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeBubblingListeners(EventListenersContainer.java:221)
at com.gargoylesoftware.htmlunit.javascript.host.Node.fireEvent(Node.java:735)
at com.gargoylesoftware.htmlunit.html.HtmlElement$2.run(HtmlElement.java:866)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538)
at com.gargoylesoftware.htmlunit.html.HtmlElement.fireEvent(HtmlElement.java:871)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeEventHandlersIfNeeded(HtmlPage.java:1162)
at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:202)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:440)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:311)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:389)
"VM Thread" prio=10 tid=0x00007feca0072800 nid=0x1549 runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x00000000025e4000 nid=0x1547 runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x00000000025e5800 nid=0x1548 runnable
"VM Periodic Task Thread" prio=10 tid=0x00007feca00d7800 nid=0x1550 waiting on condition
JNI global references: 317
我不确定为什么 URL 卡住了。它不是从方法中出来的。任何机构都可以调查一下。
更新 com.gargoylesoftware.htmlunit.html.HTMLParser.HtmlUnitDOMBuilder.parse(XMLInputSource) @Override
public void parse(final XMLInputSource inputSource) throws XNIException, IOException {
final HtmlUnitDOMBuilder oldBuilder = page_.getBuilder();
page_.setBuilder(this);
try {
super.parse(inputSource);
}
finally {
page_.setBuilder(oldBuilder);
}
}
我附加了来自 HtmlUnit 和调试的 HtmlUnit 源代码。上述方法未完全执行。
另外,我设置了超时如下:
webClient.setTimeout(120000);
那么为什么它在 2 分钟后没有出现并说 SomeThingTimeOutException 呢?