当您运行数据定义语言 (DDL)时,您实际上会向 S3 生成输出,就像您运行DML一样。
例如,请参阅以下示例,如 AWS CLI 所示:
为此示例创建一个存储桶
$ aws s3 mb s3://athena-covid/
获取一些数据(来源:Covid Tracking Project)
$ wget -o tx.csv https://api.covidtracking.com/v1/states/tx/daily.csv
将这些数据上传到 S3
$ aws s3 cp daily.csv s3://athena-covid/src/daily.csv
现在运行一些 DDL
$ aws athena start-query-execution --result-configuration OutputLocation=s3://athena-covid/out/ --query-string "$(cat ddl.sql)"
返回
{ "QueryExecutionId": "427fd5d0-02cf-49e6-82eb-0c25aae46e80" }
即使 in 中的查询没有返回结果集,它仍然在上面ddl.sql
指定的输出位置生成了一个空文本文件。-result-configuration
$ aws s3 ls s3://athena-covid/out/
哪个返回
2021-03-06 12:52:25 0 427fd5d0-02cf-49e6-82eb-0c25aae46e80.txt
注意0
在 S3 中显示对象的大小
当然,如果我们运行正常的 DML,我们会得到一个实际的结果集。
$ aws athena start-query-execution --result-configuration OutputLocation=s3://athena-covid/out/ --query-string "SELECT data_date, state, positive, negative FROM default.tx_covid LIMIT 10"
返回:
{ "QueryExecutionId": "77b548ee-4724-4716-9b3a-95acbb8bb275" }
还有一个包含一些数据的csv。
$ aws s3 ls s3://athena-covid/out/77b548ee-4724-4716-9b3a-95acbb8bb275.csv
返回
2021-03-06 12:57:00 312 77b548ee-4724-4716-9b3a-95acbb8bb275.csv
2021-03-06 12:57:00 213 77b548ee-4724-4716-9b3a-95acbb8bb275.csv.metadata
我希望以上内容能够说明 Athena 如何工作的一些概念。所有查询都有一个OutputLocation
.
仅供参考...下面的 DDL
CREATE EXTERNAL TABLE default.tx_covid (
data_date STRING,
state STRING,
positive INTEGER,
probableCases INTEGER,
negative INTEGER,
pending INTEGER,
totalTestResultsSource STRING,
totalTestResults INTEGER,
hospitalizedCurrently INTEGER,
hospitalizedCumulative INTEGER,
inIcuCurrently INTEGER,
inIcuCumulative INTEGER,
onVentilatorCurrently INTEGER,
onVentilatorCumulative INTEGER,
recovered INTEGER,
lastUpdateEt INTEGER,
dateModified INTEGER,
checkTimeEt INTEGER,
death INTEGER,
hospitalized INTEGER,
hospitalizedDischarged INTEGER,
dateChecked STRING,
totalTestsViral INTEGER,
positiveTestsViral INTEGER,
negativeTestsViral INTEGER,
positiveCasesViral INTEGER,
deathConfirmed INTEGER,
deathProbable INTEGER,
totalTestEncountersViral INTEGER,
totalTestsPeopleViral INTEGER,
totalTestsAntibody INTEGER,
positiveTestsAntibody INTEGER,
negativeTestsAntibody INTEGER,
totalTestsPeopleAntibody INTEGER,
positiveTestsPeopleAntibody INTEGER,
negativeTestsPeopleAntibody INTEGER,
totalTestsPeopleAntigen INTEGER,
positiveTestsPeopleAntigen INTEGER,
totalTestsAntigen INTEGER,
positiveTestsAntigen INTEGER,
fips STRING,
positiveIncrease INTEGER,
negativeIncrease INTEGER,
total INTEGER,
totalTestResultsIncrease INTEGER,
posNeg INTEGER,
dataQualityGrade INTEGER,
deathIncrease INTEGER,
hospitalizedIncrease INTEGER,
hash STRING,
commercialScore INTEGER,
negativeRegularScore INTEGER,
negativeScore INTEGER,
positiveScore INTEGER,
score INTEGER,
grade INTEGER
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION 's3://athena-covid/src/'
TBLPROPERTIES ('skip.header.line.count'='1')