amazon-web-services - 从两个位置选择数据

Question

我需要在 S3 上使用 Redshift Spectrum 获取数据。但是，我需要使用两个不同的文件夹（2018 / 2019）。我怎样才能在“位置”部分同时考虑？

现在我有：

create external table test_spectrum.full_events_test2
(
    timestamp bigint,
    device struct<locale:struct<country:varchar, language:varchar>, platform:struct<name:varchar>>,
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties('ignore.malformed.json'='true', 'paths'='event_type', 'serialization.format'='1')
stored as
inputformat 'org.apache.hadoop.mapred.TextInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 's3://myfolder/2019/'  -- But I want also 's3://myfolder/2018/'

但是，我也想要 's3://myfolder/2018/'

我能怎么做？

score 1 · Accepted Answer

如果您希望 Amazon Redshift Spectrum 扫描多个文件夹，它们必须有一个通用前缀。

无法指定多个单独的文件夹作为位置。

因此，您应该将这些文件夹移动到一个公用文件夹下，该公用文件夹中没有其他文件。

amazon-web-services - 从两个位置选择数据

1 回答 1

Related

Reference