0

抱歉,这与另一个帖子有关。一旦我的 CouchDB 在其中获得大量文档,ES 就会开始在日志中抛出错误并且不索引较新的文件:

[2013-08-19 17:55:08,379][WARN ][river.couchdb            ] [Morning Star] [couchdb][portal_production] failed to read from _changes, throttling....
java.io.IOException: Bogus chunk size
at sun.net.www.http.ChunkedInputStream.processRaw(ChunkedInputStream.java:319)
at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:572)
at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3052)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at org.elasticsearch.river.couchdb.CouchdbRiver$Slurper.run(CouchdbRiver.java:477)
at java.lang.Thread.run(Thread.java:724)
[2013-08-19 17:55:13,392][WARN ][river.couchdb            ] [Morning Star] [couchdb][portal_production] failed to read from _changes, throttling....

什么是交易?

编辑 - 河流状况

$ curl http://localhost:9200/_river/portal_production/_status?pretty=true 
{
  "_index" : "_river",
  "_type" : "portal_production",
  "_id" : "_status",
  "_version" : 2,
  "exists" : true, "_source" : {"ok":true,"node":{"id":"EVxlLNZ9SrSXYOLS0YBw7w","name":"Shadow Slasher","transport_address":"inet[/192.168.1.106:9300]"}}
}

编辑 - 河流序列数据

好像很低啊!

curl -X GET http://localhost:9200/_river/portal_production/_seq?pretty=true
{
  "_index" : "_river",
  "_type" : "portal_production",
  "_id" : "_seq",
  "_version" : 1,
  "exists" : true, "_source" : {"couchdb":{"last_seq":"4"}}
}

顺便说一句,我的 _changes 要大得多:

curl -X GET http://localhost:5984/portal_production/_changes?limit=5
    {"results":[
    {"seq":4,"id":"Ifilter-1","changes":[{"rev":"4-d9c8e771bc345d1182fbe7c2d63f5d00"}]},
    {"seq":7,"id":"Document-2","changes":[{"rev":"1-42f52115c4a5321328be07c490932b61"}]},
    {"seq":10,"id":"Document-4","changes":[{"rev":"1-42f52115c4a5321328be07c490932b61"}]},
    {"seq":13,"id":"Document-6","changes":[{"rev":"1-42f52115c4a5321328be07c490932b61"}]},
    {"seq":16,"id":"Document-8","changes":[{"rev":"1-42f52115c4a5321328be07c490932b61"}]},
    ...
    {"seq":208657,"id":"Document-11295","changes":[{"rev":"8-37cb48660d28bef854b2c31132bc9635"}]},
    {"seq":208661,"id":"Document-11297","changes":[{"rev":"6-daf5c5d557d0fa30b2b08be26582a33c"}]},
    {"seq":208665,"id":"Document-11299","changes":[{"rev":"6-22e57345c2ee5c7aee8b7d664606b874"}]},
    {"seq":208669,"id":"Document-11301","changes":[{"rev":"6-06deee0c3c6705238a8b07e400b2414b"}]},
    {"seq":208673,"id":"Document-11303","changes":[{"rev":"6-86fc60dd8c1d415d42a25a23eb975121"}]},
    {"seq":208677,"id":"Document-11305","changes":[{"rev":"6-6d51a577fdc9013abf64ec4ffbf9eeee"}]},
    {"seq":208683,"id":"Document-11307","changes":[{"rev":"6-726a7835ce390094b9b9e0a91aeb11f0"}]},
    {"seq":208684,"id":"Document-11286","changes":[{"rev":"9-747e63e0304a974cc7db7ff84ae80697"}]}
    ],
    "last_seq":208684}

编辑 - Couchdb 日志

这似乎很糟糕:

[Thu, 22 Aug 2013 02:49:37 GMT] [info] [<0.340.0>] 127.0.0.1 - - 'GET' /portal_production/_changes?feed=continuous&include_docs=true&heartbeat=10000&since=4 500

[Thu, 22 Aug 2013 02:49:42 GMT] [info] [<0.348.0>] 127.0.0.1 - - 'GET' /portal_production/_changes?feed=continuous&include_docs=true&heartbeat=10000&since=4 200

[Thu, 22 Aug 2013 02:49:42 GMT] [error] [<0.348.0>] Uncaught error in HTTP request: {exit,{ucs,{bad_utf8_character_code}}}

[Thu, 22 Aug 2013 02:49:42 GMT] [info] [<0.348.0>] Stacktrace: [{xmerl_ucs,from_utf8,1},
         {mochijson2,json_encode_string,2},
         {mochijson2,'-json_encode_proplist/2-fun-0-',3},
         {lists,foldl,3},
         {mochijson2,json_encode_proplist,2},
         {mochijson2,'-json_encode_proplist/2-fun-0-',3},
         {lists,foldl,3},
         {mochijson2,json_encode_proplist,2}]
4

2 回答 2

0

于是我不断地一一删除文档,并用include_doc=true重试_changes查询。但我从来没有深究。阅读其他一些相关问题,从 Microsoft 文档中导入的文本可能会有一些时髦的字符。我们正在做类似的事情,所以我们转储了数据库,并过滤掉了非 UTF8 字符。有点痛苦,但我们有太多文件无法找到问题所在。到目前为止,Elasticsearch 方面没有错误(嗯,有些超时,但这可能是另一个线程)。

于 2013-08-26T02:22:48.550 回答
0

你在索引办公文件吗?您可以使用附件插件。

我有一个尚未合并索引 couchdb 附件的分支。如果你想测试它,我很乐意得到反馈!

于 2013-09-07T06:04:29.080 回答