1

我是 Elasticsearch 的新手,我正在尝试在 R 中运行一个基本查询。因为我需要一个 API 密钥,所以我无法在 R 中使用任何可用的 Elasticsearch 库。

我可以检索弹性搜索索引中的所有文档,但似乎无法运行自定义查询。我认为这一定是因为我的GET请求格式不正确。这是我到目前为止所拥有的:

json_query <- jsonlite::toJSON('{
    "query": {
        "match" : {
            "LastName": "Baggins"
        }
    }
}
')

我尝试将 my_query 添加为body=参数,但它只是不运行查询(而是检索 10000 个文档)。我最终尝试将其粘贴到 url 参数:

get_scroll_id <-  httr::GET(url =paste("'https://Myserver:9200/indexOfInterest/_search?scroll=1m&size=10000'",my_query),
                            encoding='json',
                            add_headers(.headers = c("Authorization" = "ApiKey ****", "Content-Type" = "application/json")),
                            config=httr::config(ssl_verifypeer = FALSE,ssl_verifyhost = FALSE))

scroll_data <- fromJSON(content(get_scroll_id, as="text"))

这给了我错误:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Protocol "" not supported or disabled in libcurl

我还尝试将查询添加为查询参数,如下所示:

get_scroll_id <-  httr::GET(url ='https://Myserver:9200/indexOfInterest/_search?scroll=1m&size=10000',
                            query= json_query,
                            encoding='json',
                            add_headers(.headers = c("Authorization" = "ApiKey *****==", "Content-Type" = "application/json")),
                            verbose(),
                            config=httr::config(ssl_verifypeer = FALSE,ssl_verifyhost = FALSE))

这给了我输出:

GET https://Myserver:9200/indexOfInterest/_search?{
    "query": {
        "match" : {
            "LastName" : "Baggins"
        }
    }
}

Options:
* ssl_verifypeer: FALSE
* ssl_verifyhost: FALSE
* debugfunction: function (type, msg) 
{
    switch(type + 1, text = if (info) prefix_message("*  ", msg), headerIn = prefix_message("<- ", msg), headerOut = prefix_message("-> ", msg), dataIn = if (data_in) prefix_message("<<  ", msg, TRUE), dataOut = if (data_out) prefix_message(">> ", msg, TRUE), sslDataIn = if (ssl && data_in) prefix_message("*< ", msg, TRUE), sslDataOut = if (ssl && data_out) prefix_message("*> ", msg, TRUE))
}
* verbose: TRUE
Headers:
* Authorization: ApiKey *****==
* Content-Type: application/json

查看 Elasticsearch 文档,curl 如下:

 curl 'localhost:9200/get-together/event/_search?pretty&scroll=1m' -d ' {
 "query": {
"match" : {
 "LastName" : "Baggins"
 }
 }
}'

如何为 Elasticsearch 创建正确的命令?

4

3 回答 3

2

我认为这里的问题是, httr 包根本不支持该body参数,因为在 GET 请求中使用主体并不常见(查看这个 SO answer about HTTP GET with request body)。

但是您也可以在这里使用 POST 请求,这对我有用。尝试以下方法,看看是否有帮助:

library(httr)
library(rjson)

my_query <- rjson::toJSON(
'{
   "query": {
     "match": {
       "LastName": "Baggins"
     }
   }
 }
'
)

response <- httr::POST(
  url = "https://Myserver:9200/indexOfInterest/_search",
  httr::add_headers(
    .headers = c(
      "Authorization" = "ApiKey ****", 
      "Content-Type" = "application/json"
    )
  ), 
  body = fromJSON(my_query)
)


data <- fromJSON(content(response, as="text"))

编辑:

如果您真的坚持执行 GET 请求,请尝试使用 curl。我无法测试授权部分,但其余部分正常工作:

library(curl)
library(jsonlite)

my_query <- toJSON(
'{
   "query": {
     "match": {
       "LastName": "Baggins"
     }
   }
 }
'
)

h <- new_handle(verbose = TRUE)
handle_setheaders(h,
   "Authorization" = "ApiKey ****", 
   "Content-Type" = "application/json"
)
handle_setopt(handle = h, postfields=fromJSON(my_query), customrequest="GET")

c <- curl_fetch_memory(url = "https://Myserver:9200/indexOfInterest/_search", handle=h)

prettify(rawToChar(c$content))

这里的诀窍是使用postfields参数传递身体。但这会触发 curl 库执行 POST 请求。因此,通过使用设置customrequest="GET",我们明确告诉他使用 GET 请求。

于 2021-05-19T09:45:25.923 回答
0

你也可以试试elastic图书馆。

conn <- elastic::connect(host = "Myserver", 
                        path = "", 
                        user = "<username>",
                        pwd = "<password>",
                        port = 9200, 
                        transport_schema  = "https")
# conn$ping()

body <-'{
    "query": {
        "match" : {
            "LastName": "Baggins"
        }
    }
}
'
out <- elastic::Search(conn, index="indexOfInterest", body = body, size = 10000)

然后,如果您想滚动以获得超过 10000 个条目(这是弹性对单个查询所允许的最大值)。

# Scrolling
res <- elastic::Search(conn_cloud, index = 'indexOfInterest', time_scroll="5m",body = body, size = 10000)
out <- res$hits$hits
hits = 1
while(hits != 0){
  res <- elastic::scroll(conn, res$`_scroll_id`, time_scroll="5m")
  hits <- length(res$hits$hits)
  if(hits > 0)
    out <- c(out, res$hits$hits)
}
elastic::scroll_clear(conn_cloud, res$`_scroll_id`)

请注意,Elastic 不建议使用滚动,我使用它得到的结果略有不同。

于 2022-01-21T18:19:17.710 回答
0

可能会忽略 的输出,jsonlite::toJSON()因为它会将您的 json 包含在[]s 中。如果你rjson::toJSON()改为使用会发生什么?

my_query <- rjson::toJSON(
'{
    "query": {
        "match" : {
            "LastName": "Baggins"
        }
    }
}'
)

httr::GET(
  url = "https://Myserver:9200/indexOfInterest/_search",
  query = list(scroll = "1m", size = "10000"), 
  encoding = 'json', 
  httr::add_headers(
    .headers = c(
      "Authorization" = "ApiKey ****", 
      "Content-Type" = "application/json"
      )
  ), 
  body = my_query
)
于 2021-05-17T19:26:33.640 回答