Storm Crawler 在 Kubernetes 集群中运行,我们在 JSoupParserBolt 中遇到许多 StackOverFlow 错误
java.lang.StackOverflowError at org.apache.xerces.dom.ParentNode.internalInsertBefore(Unknown Source) at org.apache.xerces.dom.ParentNode.insertBefore(Unknown Source)
at org.apache.xerces.dom.NodeImpl.appendChild(Unknown Source) at com.digitalpebble.stormcrawler.parse.JSoupDOMBuilder.createDOM(JSoupDOMBuilder.java:111)
at com.digitalpebble.stormcrawler.parse.JSoupDOMBuilder.createDOM(JSoupDOMBuilder.java:136) at com.digitalpebble.stormcrawler.parse.JSoupDOMBuilder.createDOM(JSoupDOMBuilder.java:136)
at com.digitalpebble.stormcrawler.parse.JSoupDOMBuilder.createDOM(JSoupDOMBuilder.java:136) at com.digitalpebble.stormcrawler.parse.JSoupDOMBuilder.createDOM(JSoupDOMBuilder.java:136)
at com.digitalpebble.stormcrawler.parse.JSoupDOMBuilder.createDOM(JSoupDOMBuilder.java:136) at com.digitalpebble.stormcrawler.parse.JSoupDOMBuilder.createDOM(JSoupDOMBuilder.java:136)
at com.digitalpebble.stormcrawler.parse.JSoupDOMBuilder.createDOM(JSoupDOMBuilder.java:136) at com.digitalpebble.stormcrawler.parse.JSoupDOMBuilder.createDOM(JSoupDOMBuilder.java:136)
爬虫拓扑配置为
worker.heap.memory.mb: 8062
topology.worker.max.heap.size.mb: 8062
http.content.limit: -1
可能http.content.limit: -1
导致这个问题?