6

我正在使用 S3 Select 从 S3 Bucket 读取 csv 文件并输出为 CSV。在输出中,我只看到行,但看不到标题。如何获得包含标题的输出。

import boto3

s3 = boto3.client('s3')

r = s3.select_object_content(
        Bucket='demo_bucket',
        Key='demo.csv',
        ExpressionType='SQL',
        Expression="select * from s3object s",
        InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},
        OutputSerialization={'CSV': {}},
)

for event in r['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        print(records)

CSV

Name, Age, Status
Rob, 25, Single
Sam, 26, Married

s3select 的输出

Rob, 25, Single
Sam, 26, Married
4

3 回答 3

6

改变InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},

InputSerialization={'CSV': {"FileHeaderInfo": "NONE"}},

然后,它将打印全部内容,包括标题。

说明

FileHeaderInfo接受“NONE|USE|IGNORE”之一。

使用NONEoption 而不是USE,然后它也会打印 header ,NONE告诉您还需要 header 进行处理。

这里是参考。 https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.select_object_content

我希望它有所帮助。

于 2019-04-26T06:05:13.513 回答
1

Amazon S3 Select 不会输出标头。

在您的代码中,您可以只包含一个print命令来在循环结果之前输出标题。

于 2018-06-14T02:49:33.727 回答
1

Red Boy 的解决方案不允许您在查询中使用列名,而是必须使用列索引。这对我不利,所以我的解决方案是进行另一个查询,只获取标题并将它们与实际查询结果连接起来。这是在 JavaScript 上,但同样适用于 Python:

      const params = {
        Bucket: bucket,
        Key: "file.csv",
        ExpressionType: 'SQL',
        Expression: `select * from s3object s where s."date" >= '${fromDate}'`,
        InputSerialization: {'CSV': {"FileHeaderInfo": "USE"}},
        OutputSerialization: {'CSV': {}},
      };

      //s3 select doesn't return the headers, so need to run another query to only get the headers (see '{"FileHeaderInfo": "NONE"}')
      const headerParams = {
        Bucket: bucket,
        Key: "file.csv",
        ExpressionType: 'SQL',
        Expression: "select * from s3object s limit 1", //this will only get the first record of the csv, and since we are not parsing headers, they will be included
        InputSerialization: {'CSV': {"FileHeaderInfo": "NONE"}},
        OutputSerialization: {'CSV': {}},
      };

      //concatenate header + data -- getObject is a method that handles the request
      return await this.getObject(s3, headerParams) + await this.getObject(s3, params);
于 2019-09-03T11:36:58.197 回答