python - AWS DMS 使用具有大型结果集的 DatabaseMigrationService.Client.describe_table_statistics 丢失记录

Question

我正在使用 describe_table_statistics 检索给定 DMS 任务中的表列表，并使用 response['Marker'] 有条件地循环 describe_table_statistics。

当我不使用过滤器时，我得到了正确的记录数，13k+。当我使用结果集少于 MaxRecords 的过滤器或过滤器组合时，我得到正确的记录数。

但是，当我传入一个过滤器，它会获得比 MaxRecords 更大的记录集时，我得到的记录比我应该得到的要少得多。

这是我检索表集的函数：

def get_dms_task_tables(account, region, task_name, schema_name=None, table_state=None):
   tables=[]
   max_records=500

   filters=[]
   if schema_name:
      filters.append({'Name':'schema-name', 'Values':[schema_name]})
   if table_state:
      filters.append({'Name':'table-state', 'Values':[table_state]})

   task_arn = get_dms_task_arn(account, region, task_name)

   session = boto3.Session(profile_name=account, region_name=region)
   client = session.client('dms')

   response = client.describe_table_statistics(
      ReplicationTaskArn=task_arn
      ,Filters=filters
      ,MaxRecords=max_records)

   tables += response['TableStatistics']

   while len(response['TableStatistics']) == max_records:
      response = client.describe_table_statistics(
         ReplicationTaskArn=task_arn
         ,Filters=filters
         ,MaxRecords=max_records
         ,Marker=response['Marker'])

      tables += response['TableStatistics']

   return tables

为了进行故障排除，我循环遍历每个表格打印一行的表格：

        print(', '.join((
            t['SchemaName']
            ,t['TableName']
            ,t['TableState'])))

当我没有为“表已完成”的表状态传递任何过滤器和 grep 时，我通过控制台得到 12k+ 条记录，这是正确的计数

至少从表面上看，响应循环是有效的。

当我传入一个模式名称和表状态过滤条件时，我得到了正确的计数，正如控制台所确认的那样，但这个计数小于 MaxRecords。

当我只为“表已完成”传递表状态过滤器时，我只得到 949 条记录，所以我丢失了 11k 条记录。

我尝试从循环内的 describe_table_statistics 中省略 Filter 参数，但在所有情况下我都得到相同的结果。

我怀疑我在循环内对 describe_table_statistics 的调用有问题，但我无法在亚马逊的文档中找到这方面的示例来确认这一点。

score 0 · Accepted Answer

应用过滤器时，describe_table_statistics 不遵守 MaxRecords 限制。

事实上，它似乎做的是检索（2 x MaxRecords），应用过滤器，然后返回该集合。或者它可能检索 MaxRecords，应用过滤器，并继续直到结果集大于 MaxRecords。无论哪种方式，我的 while 条件都是问题所在。

我换了

while len(response['TableStatistics']) == max_records:

和

while 'Marker' in response:

现在该函数返回正确数量的记录。

顺便说一句，我的第一次尝试是

while len(response['TableStatistics']) >= 1:

但在循环的最后一次迭代中，它抛出了这个错误：

KeyError: 'Marker'

完成和工作的功能现在看起来是这样的：

def get_dms_task_tables(account, region, task_name, schema_name=None, table_state=None):
   tables=[]
   max_records=500

   filters=[]
   if schema_name:
      filters.append({'Name':'schema-name', 'Values':[schema_name]})
   if table_state:
      filters.append({'Name':'table-state', 'Values':[table_state]})

   task_arn = get_dms_task_arn(account, region, task_name)

   session = boto3.Session(profile_name=account, region_name=region)
   client = session.client('dms')

   response = client.describe_table_statistics(
      ReplicationTaskArn=task_arn
      ,Filters=filters
      ,MaxRecords=max_records)

   tables += response['TableStatistics']

   while 'Marker' in response:
      response = client.describe_table_statistics(
         ReplicationTaskArn=task_arn
         ,Filters=filters
         ,MaxRecords=max_records
         ,Marker=response['Marker'])

      tables += response['TableStatistics']

   return tables

python - AWS DMS 使用具有大型结果集的 DatabaseMigrationService.Client.describe_table_statistics 丢失记录

1 回答 1

Related

Reference