0

我正在尝试将此脚本转换为使用新的Elasticsearch 官方客户端而不是旧的(现已弃用)ElasticSearch.pm,但我无法让滚动搜索工作。这是我所拥有的:

#! /usr/bin/perl

use strict;
use warnings;
use 5.010;

use Elasticsearch ();
use Elasticsearch::Scroll ();

my $es = Elasticsearch->new(
  nodes => 'http://api.metacpan.org:80',
  cxn   => 'NetCurl',
  cxn_pool => 'Static::NoPing',
  #log_to   => 'Stderr',
  #trace_to => 'Stderr',
);

say 'Getting all results at once works:';
my $results = $es->search(
  index => 'v0',
  type  => 'release',
  body  => {
    filter => { range => { date => { gte => '2013-11-28T00:00:00.000Z' } } },
    fields => [qw(author archive date)],
  },
);

foreach my $hit (@{ $results->{hits}{hits} }) {
  my $field = $hit->{fields};
  say "@$field{qw(date author archive)}";
}

say "\nUsing a scrolled search does not work:";
my $scroller = Elasticsearch::Scroll->new(
  es          => $es,
  index       => 'v0',
  search_type => 'scan',
  size        => 100,
  type        => 'release',
  body => {
    filter => { range => { date => { gte => '2013-11-28T00:00:00.000Z' } } },
    fields => [qw(author archive date)],
  },
);

while (my $hit = $scroller->next) {
  my $field = $hit->{fields};
  say "@$field{qw(date author archive)}";
} # end while $hit

第一次搜索,我只是在一个块中获得所有结果,工作正常。但是我试图滚动浏览结果的第二次搜索会产生:

Using a scrolled search does not work:
[Request] ** [http://api.metacpan.org:80]-[500]
ActionRequestValidationException[Validation Failed: 1: scrollId is missing;],
called from sub Elasticsearch::Transport::try {...}
at .../Try/Tiny.pm line 83. With vars: {'body' =>
'ActionRequestValidationException[Validation Failed: 1: scrollId is missing;]',
'request' => {'path' => '/_search/scroll','serialize' => 'std',
'body' => 'c2Nhbjs1OzE3MjU0NjM2MjowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2NDowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2MTowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2MDowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2MzowakFELUU3VFFibTJIZW1ibUo0SUdROzE7dG90YWxfaGl0czoxNDQ7',
'method' => 'GET','qs' => {'scroll' => '1m'},'ignore' => [],
'mime_type' => 'application/json'},'status_code' => 500}

我究竟做错了什么?我正在使用 Elasticsearch 0.75 和 Elasticsearch-Cxn-NetCurl 0.02 和 Perl 5.18.1。

4

2 回答 2

1

我终于让它与较新的 Search::Elasticsearch 官方客户端一起使用。这是简短的版本:

#! /usr/bin/perl

use strict;
use warnings;
use 5.010;

use Search::Elasticsearch ();

my $es = Search::Elasticsearch->new(
  cxn_pool => 'Static::NoPing',
  nodes    => 'api.metacpan.org:80',
);

my $scroller = $es->scroll_helper(
  index       => 'v0',
  type        => 'release',
  search_type => 'scan',
  scroll      => '2m',
  size        => 100,
  body        => {
    fields => [qw(author archive date)],
    query  => { range => { date => { gte => '2015-02-01T00:00:00.000Z' } } },
  },
);

while (my $hit = $scroller->next) {
  my $field = $hit->{fields};
  say "@$field{qw(date author archive)}";
} # end while $hit

请注意,当您进行滚动搜索时,不会对记录进行排序。我最终将记录转储到一个临时数据库并在本地对它们进行排序。更新后的脚本在 GitHub 上。

于 2015-02-23T18:49:15.350 回答
0

我没有直接的答案,但我可能有解决问题的方法:

我按照你的链接Elasticsearch::Client找到了一个 scroll() 方法:

https://metacpan.org/pod/Elasticsearch::Client::Direct#scroll

此方法将scrollscroll_id作为参数。scroll是在搜索过期之前您可以继续调用滚动方法的分钟数。scroll_id是最后一次调用 scroll() 结束位置的标记。

$results = $e->scroll(
    scroll      => '1m',
    scroll_id   => $id
);

Elasticsearch::Scroll是一个围绕滚动()的面向对象的包装器,它隐藏scrollscroll_id.

我会按照perl -d您的脚本运行,然后$scroller->next尽可能深入并遵循它。那里的某些东西正在尝试搜索应该填充scroll_idscrollId失败的搜索。

诚然,我在这里的描述非常粗略......我在谷歌搜索期间遇到了关于滚动 ID 是什么以及做什么的准确描述,但我似乎无法再次找到它。

于 2013-11-30T03:23:19.147 回答