amazon-web-services - S3 select - 如何通过非标准时间戳比较查询

Question

我正在使用 S3 存储桶，其中数据按 ID 和年/月组织到文件中——这意味着每个 ID 和月一个文件。在每个 (csv.gz) 文件中，每条记录都有一个时间戳，格式为：（YYYY-MM-dd HH:mm:ss注意缺少T）。

现在，在查询数据时，我希望将日期时间粒度支持到秒，因此自然希望在使用 Python 管理数据之前就已经在 S3 中过滤数据。

但是我找不到任何方法来做到这一点。该函数TO_TIMESTAMP不支持用户提供的格式（需要T日期/时间分隔符）并且组合 SUBSTRING 和 CAST ( CAST(SUBSTRING(my_timestamp_column, 1, 10) AS TIMESTAMP)) 会产生The query cannot be evaluated错误。

有没有办法解决？文档指出该函数TO_TIMESTAMP是“TO_STRING 的逆运算”，这并不完全正确，因为后者支持time_format_pattern.

score 0 · Accepted Answer

Having same issue over here, I went an step over and change my csv file to grant date field with require format by timestamp date type in S3 Select.The requiere format is described here S3 data types

So first, in order to response the question, based on S3 Select documentation, I think is not possible to work with a date without T at the end. By the time you correct that, you will be able to work with CAST function. Next is what I do:

select * from s3object as s where CAST('2020-01-01T' AS TIMESTAMP) < CAST('2021-01-01T' AS TIMESTAMP)

That works just okay, however as you can see, I'm not passing s."Date" which is the field header in my csv file due to following error:

Attempt to convert from one data type to another failed at line 1, column 39: cast from STRING to TIMESTAMP.

I hope have been help a little bit, and hope someone can help with this error.

score 0 · Accepted Answer

我想我必须解决相同或类似的问题。正如您所说，文档（https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-glacier-select-sql-reference-date.html#s3-glacier-select-sql-reference-to -timestamp ) 表明该函数TO_TIMESTAMP是的逆运算TO_STRING。但是对我来说，文档具有误导性，因为它没有明确说明该TO_TIMESTAMP函数确实支持time_format_pattern作为第二个参数。文档显示它只需要一个参数，但实际上可以有两个参数。

我能够将非标准时间戳20190101T050000.000Z从类型字符串转换为时间戳，如下所示：

aws s3api select-object-content --bucket foo_bucket --key foo.json.gz --expression "SELECT * FROM s3object s WHERE TO_TIMESTAMP(s.\"timestamp\", 'yMMdd''T''Hmmss.SSS''Z''') < TO_TIMESTAMP('20190101T050000.000Z', 'yMMdd''T''Hmmss.SSS''Z''')" --expression-type 'SQL' --input-serialization '{ "CompressionType": "GZIP","JSON": {"Type": "DOCUMENT"}}' --output-serialization '{"JSON": {"RecordDelimiter": "\n"}}' /dev/shm/foo.json

希望能帮助别人。

amazon-web-services - S3 select - 如何通过非标准时间戳比较查询

2 回答 2

Related

Reference