elasticsearch - 我应该如何在 logstash 中使用 sql_last_value？

Question

我很不清楚sql_last_value当我这样发表声明时会做什么：

statement => "SELECT * from mytable where id > :sql_last_value"

我可以稍微理解使用它的原因，它不会浏览整个数据库表以更新字段，而是只更新新添加的记录。如我错了请纠正我。

所以我想要做的是，使用以下方法创建索引logstash：

input {
    jdbc {
        jdbc_connection_string => "jdbc:mysql://hostmachine:3306/db" 
        jdbc_user => "root"
        jdbc_password => "root"
        jdbc_validate_connection => true
        jdbc_driver_library => "/path/mysql_jar/mysql-connector-java-5.1.39-bin.jar"
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        schedule => "* * * * *"
        statement => "SELECT * from mytable where id > :sql_last_value"
        use_column_value => true
        tracking_column => id
        jdbc_paging_enabled => "true"
        jdbc_page_size => "50000"
    }
}

output {
    elasticsearch {
        #protocol => http
        index => "myindex"
        document_type => "message_logs"
        document_id => "%{id}"
        action => index
        hosts => ["http://myhostmachine:9402"]
    }
}

一旦我这样做了，文档就根本不会上传到索引中。我哪里错了？

任何帮助都将不胜感激。

score 7 · Accepted Answer

如果您的表中有时间戳列（例如last_updated），您最好使用它而不是 ID 列。因此，当记录更新时，您也可以修改该时间戳，jdbc输入插件将获取该记录（即 ID 列不会更改其值，并且不会获取更新的记录）

input {
    jdbc {
        jdbc_connection_string => "jdbc:mysql://hostmachine:3306/db" 
        jdbc_user => "root"
        jdbc_password => "root"
        jdbc_validate_connection => true
        jdbc_driver_library => "/path/mysql_jar/mysql-connector-java-5.1.39-bin.jar"
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        jdbc_paging_enabled => "true"
        jdbc_page_size => "50000"
        schedule => "* * * * *"
        statement => "SELECT * from mytable where last_updated > :sql_last_value"
    }
}

如果您仍然决定保留 ID 列，则应删除该$HOME/.logstash_jdbc_last_run文件并重试。

score 2 · Accepted Answer

简而言之，sql_last_value允许您将上次运行的 sql 中的数据作为其名称 sugets 持久化。

当您安排查询时，此值特别有用。但为什么 ... ？因为您可以根据存储的值创建 sql 语句条件，sql_last_value并避免检索已为您的 logstash 输入提取或在上次管道执行后更新的行。

使用时要注意的事项sql_last_value

默认情况下，此变量存储上次运行的时间戳。当您需要提取基于列creation_date last_update等的数据时很有用。
您可以sql_last_value通过使用特定表的列值跟踪它来定义值。当您需要基于自动增量数据摄取时很有用。为此，您需要指定use_column_value => true和tracking_column => "column_name_to_track"。

以下示例将存储最后一个 mytable 行的id到:sql_last_value下一次执行中摄取以前没有摄取的行，这意味着它的 id 大于最后摄取的 id 的行。

input {
    jdbc {
        # ...
        schedule => "* * * * *"
        statement => "SELECT * from mytable where id > :sql_last_value"
        use_column_value => true
        tracking_column => id
    }
}

极其重要！！！

当您在管道中使用多个输入时，每个输入块将覆盖sql_last_value最后一个的值。为了避免这种行为，您可以使用last_run_metadata_path => "/path/to/sql_last_value/of_your_pipeline.yml"选项，这意味着每个管道将自己的值存储在不同的文件中。

score 2 · Accepted Answer

有几点需要注意：

如果您之前在没有计划的情况下运行了 Logstash，那么在使用计划运行 Logstash 之前，请删除该文件：
```
$HOME/.logstash_jdbc_last_run
```
在 Windows 中，此文件位于：
```
C:\Users\<Username>\.logstash_jdbc_last_run
```
Logstash 配置中的“statement =>”应该有 tracking_column 的“order by”。
tracking_column 应该正确给出。

以下是 Logstash 配置文件的示例：

    input {
jdbc {
    # MySQL DB jdbc connection string to our database, softwaredevelopercentral
    jdbc_connection_string => "jdbc:mysql://localhost:3306/softwaredevelopercentral?autoReconnect=true&useSSL=false"
    # The user we wish to execute our statement as
    jdbc_user => "root"
    # The user password
    jdbc_password => ""
    # The path to our downloaded jdbc driver
    jdbc_driver_library => "D:\Programs\MySQLJava\mysql-connector-java-6.0.6.jar"
    # The name of the driver class for MySQL DB
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    # our query
    schedule => "* * * * *"
    statement => "SELECT * FROM student WHERE studentid > :sql_last_value order by studentid"
    use_column_value => true
    tracking_column => "studentid"
}
}
output {
stdout { codec => json_lines }
elasticsearch { 
   hosts => ["localhost:9200"]
   index => "students"
   document_type => "student"
   document_id => "%{studentid}"
   }

}

要查看相同的工作示例，您可以查看我的博客文章： http: //softwaredevelopercentral.blogspot.com/2017/10/elasticsearch-logstash-kibana-tutorial.html

elasticsearch - 我应该如何在 logstash 中使用 sql_last_value？

3 回答 3

极其重要！！！

Related

Reference