python-3.x - 将多个文件从 Redshift 卸载到 S3

Question

您好我正在尝试将多个表从 Redshift 卸载到特定的 S3 存储桶，但出现以下错误：

 psycopg2.InternalError: Specified unload destination on S3 is not empty. Consider using a different bucket / prefix, manually removing the target files in S3, or using the ALLOWOVERWRITE option.

如果我在 unload_function 上添加“allowoverwrite”选项，它会在表之前覆盖并卸载 S3 中的最后一个表。

这是我给出的代码：

import psycopg2

def unload_data(r_conn, aws_iam_role, datastoring_path, region, table_name):
     unload = '''unload ('select * from {}')
                    to '{}'
                    credentials 'aws_iam_role={}'
                    manifest
                    gzip
                    delimiter ',' addquotes escape parallel off '''.format(table_name, datastoring_path, aws_iam_role)

     print ("Exporting table to datastoring_path")
     cur = r_conn.cursor()
     cur.execute(unload)
     r_conn.commit()

def main():
     host_rs = 'dataingestion.*********.us******2.redshift.amazonaws.com'
     port_rs = '5439'
     database_rs = '******'
     user_rs = '******'
     password_rs = '********'
     rs_tables = [ 'Employee', 'Employe_details' ]

     iam_role = 'arn:aws:iam::************:role/RedshiftCopyUnload'
     s3_datastoring_path = 's3://mysamplebuck/'
     s3_region = 'us_*****_2'
     print ("Exporting from source")
     src_conn = psycopg2.connect(host = host_rs,
                                 port = port_rs,
                                 database = database_rs,
                                 user = user_rs,
                                 password = password_rs)
     print ("Connected to RS")

     for i, tabe in enumerate(rs_tables):
          if tabe[0] == tabe[-1]:
              print("No files to read!")
          unload_data(src_conn, aws_iam_role = iam_role, datastoring_path = s3_datastoring_path, region = s3_region, table_name = rs_tables[i])
          print (rs_tables[i])


if __name__=="__main__":
main()

score 4 · Accepted Answer

它抱怨您将数据保存到同一目的地。

这就像将您计算机上的所有文件复制到同一个目录 - 将有文件被覆盖。

您应该将每个表更改datastoring_path为不同的，例如：

.format(table_name, datastoring_path + '/' + table_name, aws_iam_role)

python-3.x - 将多个文件从 Redshift 卸载到 S3

1 回答 1

Related

Reference