我想将二进制文件(PDF、WORD、TEXT)索引到 elasticsearch 中,为此我使用了 fscrawler,并且在运行 fscrawler 时出现以下错误。
我已关注此链接:https ://fscrawler.readthedocs.io/en/latest/user/getting_started.html
配置文件 - YAML
---
name: "hello"
fs:
url: "/home/gowtham/Documents"
update_rate: "15m"
excludes:
- "*/~*"
json_support: false
filename_as_id: false
add_filesize: true
remove_deleted: true
add_as_inner_object: false
store_source: false
index_content: true
attributes_support: false
raw_metadata: false
xml_support: false
index_folders: true
lang_detect: false
continue_on_error: false
ocr:
language: "eng"
enabled: true
pdf_strategy: "ocr_and_text"
elasticsearch:
nodes:
- url: "http://10.0.2.2:9200"
bulk_size: 100
flush_interval: "5s"
byte_size: "10mb"
index : "hello"
这个位置/home/gowtham/Documents有一个 pdf 文件
我收到以下错误
12:46:22,477 WARN [f.p.e.c.f.c.v.ElasticsearchClientV6] failed to create index [hello], disabling crawler...
12:46:22,478 FATAL [f.p.e.c.f.c.FsCrawlerCli] Fatal error received while running the crawler: [Elasticsearch exception [type=illegal_argument_exception, reason=request [/hello] contains unrecognized parameter: [include_type_name]]]
12:46:22,478 DEBUG [f.p.e.c.f.c.FsCrawlerCli] error caught
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=illegal_argument_exception, reason=request [/hello] contains unrecognized parameter: [include_type_name]]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2053) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2030) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1777) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1734) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1696) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
at org.elasticsearch.client.IndicesClient.create(IndicesClient.java:191) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndex(ElasticsearchClientV6.java:240) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndex(ElasticsearchClientV6.java:603) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndices(ElasticsearchClientV6.java:436) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.start(FsCrawlerImpl.java:161) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:270) [fscrawler-cli-2.7-SNAPSHOT.jar:?]
Suppressed: org.elasticsearch.client.ResponseException: method [PUT], host [http://10.0.2.2:9200], URI [/hello?master_timeout=30s&include_type_name=true&timeout=30s], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"request [/hello] contains unrecognized parameter: [include_type_name]"}],"type":"illegal_argument_exception","reason":"request [/hello] contains unrecognized parameter: [include_type_name]"},"status":400}
at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:936) ~[elasticsearch-rest-client-6.7.1.jar:6.7.1]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:233) ~[elasticsearch-rest-client-6.7.1.jar:6.7.1]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1764) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1734) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1696) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
at org.elasticsearch.client.IndicesClient.create(IndicesClient.java:191) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndex(ElasticsearchClientV6.java:240) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndex(ElasticsearchClientV6.java:603) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndices(ElasticsearchClientV6.java:436) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.start(FsCrawlerImpl.java:161) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:270) [fscrawler-cli-2.7-SNAPSHOT.jar:?]
Caused by: org.elasticsearch.client.ResponseException: method [PUT], host [http://10.0.2.2:9200], URI [/hello?master_timeout=30s&include_type_name=true&timeout=30s], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"request [/hello] contains unrecognized parameter: [include_type_name]"}],"type":"illegal_argument_exception","reason":"request [/hello] contains unrecognized parameter: [include_type_name]"},"status":400}
at org.elasticsearch.client.RestClient$1.completed(RestClient.java:552) ~[elasticsearch-rest-client-6.7.1.jar:6.7.1]
at org.elasticsearch.client.RestClient$1.completed(RestClient.java:537) ~[elasticsearch-rest-client-6.7.1.jar:6.7.1]
at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119) ~[httpcore-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177) ~[httpasyncclient-4.1.2.jar:4.1.2]
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436) ~[httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326) ~[httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) ~[httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[httpasyncclient-4.1.2.jar:4.1.2]
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[httpasyncclient-4.1.2.jar:4.1.2]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) ~[httpcore-nio-4.4.5.jar:4.4.5]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_201]
12:46:22,484 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [hello]
12:46:22,485 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
12:46:22,486 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
12:46:22,487 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [hello] stopped
请帮我解决这个问题。
提前致谢。