我正在使用 nutch 2.1 并爬取一个站点。问题是爬虫一直显示获取 url spinwaiting/active 并且由于获取需要很长时间,因此与 mysql 的连接超时。我怎样才能减少一次提取的次数,使mysql不会超时?nutch 中是否有一个设置,我可以说只获取 100 或 500 个 url,然后解析并存储到 mysql,然后再次获取下一个 100 或 500 个 url?
错误信息:
Unexpected error for http://www.example.com
java.io.IOException: java.sql.BatchUpdateException: The last packet successfully received from the server was 36,928,172 milliseconds ago. The last packet sent successfully to the server was 36,928,172 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.nutch.fetcher.FetcherReducer$FetcherThread.output(FetcherReducer.java:663)
at org.apache.nutch.fetcher.FetcherReducer$FetcherThread.run(FetcherReducer.java:534)
Caused by: java.sql.BatchUpdateException: The last packet successfully received from the server was 36,928,172 milliseconds ago. The last packet sent successfully to the server was 36,928,172 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2028)
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1451)
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
... 5 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received from the server was 36,928,172 milliseconds ago. The last packet sent successfully to the server was 36,928,172 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
at sun.reflect.GeneratedConstructorAccessor49.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1116)
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3364)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1983)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2624)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2127)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2427)
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1980)
... 7 more
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3345)
... 13 more