amazon-web-services - 从 DynamoDB 导出数据

Question

是否可以以某种格式从 DynamoDB 表中导出数据？

具体用例是我想从我的生产 dynamodb 数据库中导出数据并将该数据导入我的本地 dynamodb 实例，以便我的应用程序可以使用本地数据副本而不是生产数据。

我使用链接作为 DynamoDB 的本地实例。

score 48 · Accepted Answer

这会将所有项目导出为 jsons 文档

aws dynamodb scan --table-name TABLE_NAME > export.json

该脚本将从远程 dynamodb 表中读取并将完整表导入本地。

TABLE=YOURTABLE
maxItems=25
index=0

DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
((index+=1)) 
echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000


nextToken=$(echo $DATA | jq '.NextToken')
while [[ "${nextToken}" != "" ]]
do
  DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
  ((index+=1))
  echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
  aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000
  nextToken=$(echo $DATA | jq '.NextToken')
done

这是使用文件将导出的数据保存在磁盘上的脚本版本。

TABLE=YOURTABLE
maxItems=25
index=0
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
((index+=1))
echo $DATA | cat > "$TABLE-$index.json"

nextToken=$(echo $DATA | jq '.NextToken')
while [[ "${nextToken}" != "" ]]
do
  DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
  ((index+=1))
  echo $DATA | cat > "$TABLE-$index.json"
  nextToken=$(echo $DATA | jq '.NextToken')
done

for x in `ls *$TABLE*.json`; do
  cat $x | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
  aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000
done

score 20 · Accepted Answer

有一个名为DynamoDBtoCSV的工具

可用于将所有数据导出到 CSV 文件。但是，相反，您将不得不构建自己的工具。我的建议是将此功能添加到工具中，并将其贡献给 Git 存储库。

另一种方法是使用 AWS Data Pipeline 执行此任务（您将节省从 AWS 基础设施外部读取数据的所有成本）。方法类似：

构建输出管道
下载文件。
使用自定义阅读器解析它。

score 14 · Accepted Answer

这是一种使用aws cli和jq从表中导出一些数据（通常我们只想在本地获取我们的产品数据样本）的方法。假设我们有一个毫不奇怪的 prod 表和一个名为的本地表 my-prod-tablemy-local-table

要导出数据，请运行以下命令：

aws dynamodb scan --table-name my-prod-table \
| jq '{"my-local-table": [.Items[] | {PutRequest: {Item: .}}]}' > data.json

基本上发生的事情是我们扫描我们的 prod 表，将扫描的输出转换为batchWriteItem的格式并将结果转储到文件中。

要在本地表中导入数据，请运行：

aws dynamodb batch-write-item \
--request-items file://data.json \
--endpoint-url http://localhost:8000

注意：请求有一些限制batch-write-item- BatchWriteItem 操作最多可以包含 25 个单独的 PutItem 和 DeleteItem 请求，并且最多可以写入 16 MB 的数据。（单个项目的最大大小为 400 KB。）。

score 12 · Accepted Answer

将其从 DynamoDB 界面导出到 S3。

然后使用 sed 将其转换为 Json：

sed -e 's/$/}/' -e $'s/\x02/,"/g' -e $'s/\x03/":/g' -e 's/^/{"/' <exported_table> > <exported_table>.json

资源

score 4 · Accepted Answer

我扩展了Valy dia解决方案以允许仅使用aws-cli |进行所有导出过程 jq

aws dynamodb scan --max-items 3 --table-name <TABLE_NAME> \
| jq '{"<TABLE_NAME>": [.Items[] | {PutRequest: {Item: .}}]}' > data.json

aws dynamodb describe-table --table-name <TABLE_NAME> > describe.json | jq ' .Table | {"TableName": .TableName, "KeySchema": .KeySchema, "AttributeDefinitions": .AttributeDefinitions,  "ProvisionedThroughput": {
      "ReadCapacityUnits": 5,
      "WriteCapacityUnits": 5
}}' > table-definition.json

aws dynamodb create-table --cli-input-json file://table-definition.json  --endpoint-url http://localhost:8000 --region us-east-1

aws dynamodb batch-write-item --request-items file://data.json --endpoint-url http://localhost:8000

aws dynamodb scan --table-name <TABLE_NAME> --endpoint-url http://localhost:8000

score 3 · Accepted Answer

我认为我的答案更类似于 Ivailo Bardarov ，如果计划从 linux 实例运行这个

1.登录到您的 AWS 账户并转到 IAM 为角色创建具有有限策略的用户（当然是出于安全目的）。这应该仅限于读取您要备份的 dynamodb 表。

2.复制访问密钥和秘密并更新以下命令以在 Linux 上运行它（但请确保您的表不是很大，并且可能会为您运行它的机器创建空间问题）

AWS_ACCESS_KEY_ID='put_your_key' AWS_SECRET_ACCESS_KEY='put_your_secret' aws --region='put_your_region' dynamodb scan --table-name 'your_table_name'>> export_$(date "+%F-%T").json

注意类似的命令可以在我没有测试过的 Windows/Powershell 上执行，所以我不在这里添加它。

score 2 · Accepted Answer

2

Try my simple node.js script dynamo-archive. It exports and imports in JSON format.

于 2013-09-21T18:34:52.853 回答

score 2 · Accepted Answer

我发现当前用于简单导入/导出（包括通过 DynamoDB Local 往返）的最佳工具是这个 Python 脚本：

https://github.com/bchew/dynamodump

该脚本支持模式导出/导入以及数据导入/导出。它还使用批处理 API 进行高效操作。

我已经成功地使用它从 DynamoDB 表中获取数据到本地 DynamoDB 以用于开发目的，它非常适合我的需求。

score 2 · Accepted Answer

扩展@Ivailo Bardarov的答案，我编写了以下脚本，将远程 DynamoDB 中的表复制到本地表：

#!/bin/bash
declare -a arr=("table1" "table2" "table3" "table4")
for i in "${arr[@]}"
do
    TABLE=$i
    maxItems=25
    index=0
    echo "Getting table description of $TABLE from remote database..."
    aws dynamodb describe-table --table-name $TABLE > table-description.json
    echo
    echo "Creating table $TABLE in the local database..."
    ATTRIBUTE_DEFINITIONS=$(jq .Table.AttributeDefinitions table-description.json)
    KEY_SCHEMA=$(jq .Table.KeySchema table-description.json)
    BILLING_MODE=$(jq .Table.BillingModeSummary.BillingMode table-description.json)
    READ_CAPACITY_UNITS=$(jq .Table.ProvisionedThroughput.ReadCapacityUnits table-description.json)
    WRITE_CAPACITY_UNITS=$(jq .Table.ProvisionedThroughput.WriteCapacityUnits table-description.json)
    TABLE_DEFINITION=""

    if [[ "$READ_CAPACITY_UNITS" > 0 && "$WRITE_CAPACITY_UNITS" > 0 ]]
    then
    TABLE_DEFINITION="{\"AttributeDefinitions\":$ATTRIBUTE_DEFINITIONS,\"TableName\":\"$TABLE\",\"KeySchema\":$KEY_SCHEMA,\"ProvisionedThroughput\":{\"ReadCapacityUnits\":$READ_CAPACITY_UNITS,\"WriteCapacityUnits\":$WRITE_CAPACITY_UNITS}}"
    else
    TABLE_DEFINITION="{\"AttributeDefinitions\":$ATTRIBUTE_DEFINITIONS,\"TableName\":\"$TABLE\",\"KeySchema\":$KEY_SCHEMA,\"BillingMode\":$BILLING_MODE}"
    fi

    echo $TABLE_DEFINITION > create-table.json
    aws dynamodb create-table --cli-input-json file://create-table.json --endpoint-url http://localhost:8000
    echo "Querying table $TABLE from remote..."
    DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
    ((index+=1))
    echo "Saving remote table [$TABLE] contents to inserts.json file..."
    echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.json
    echo "Inserting rows to $TABLE in local database..."
    aws dynamodb batch-write-item --request-items file://inserts.json --endpoint-url http://localhost:8000

    nextToken=$(echo $DATA | jq '.NextToken')        
    while [[ "$nextToken" != "" && "$nextToken" != "null" ]]
    do
      echo "Querying table $TABLE from remote..."
      DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
      ((index+=1))
      echo "Saving remote table [$TABLE] contents to inserts.json file..."
      echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.json
      echo "Inserting rows to $TABLE in local database..."
      aws dynamodb batch-write-item --request-items file://inserts.json --endpoint-url http://localhost:8000
      nextToken=$(echo "$DATA" | jq '.NextToken')
    done
done

echo "Deleting temporary files..."
rm -f table-description.json
rm -f create-table.json
rm -f inserts.json

echo "Database sync complete!"

该脚本循环遍历字符串数组，对于每个表名，它首先获取表的描述，并使用所需的最少参数构建一个创建 JSON 文件并创建表。然后它使用@Ivailo Bardarov的其余逻辑来生成插入并将它们推送到创建的表中。最后，它清理生成的 JSON 文件。

请记住，我的目的只是为开发目的创建表的粗略副本（因此是所需的最低参数）。

score 1 · Accepted Answer

我创建了一个实用程序类来帮助开发人员进行导出。如果您不想使用 AWS 的数据管道功能，可以使用此功能。指向 git hub 存储库的链接是 -这里

score 1 · Accepted Answer

对于那些宁愿使用 java 执行此操作的人，有DynamodbToCSV4j。

JSONObject config = new JSONObject();
config.put("accessKeyId","REPLACE");
config.put("secretAccessKey","REPLACE");
config.put("region","eu-west-1");
config.put("tableName","testtable");
d2csv d = new d2csv(config);

score 0 · Accepted Answer

您可以在本地尝试此代码。但首先应该执行以下命令 npm init -y && npm install aws-sdk

const AWS = require('aws-sdk');
AWS.config.update({region:'eu-central-1'}); 
const fs = require('fs');
const TABLE_NAME = "YOURTABLENAME"

const docClient = new AWS.DynamoDB.DocumentClient({
    "sslEnabled": false,
    "paramValidation": false,
    "convertResponseTypes": false,
    "convertEmptyValues": true
});

async function exportDB(){
    let params = {
        TableName: TABLE_NAME
    };
    let result = [];
    let items;
    do  {
        items =  await docClient.scan(params).promise();
        items.Items.forEach((item) => result.push(item));
        params.ExclusiveStartKey  = items.LastEvaluatedKey;
    }   while(typeof items.LastEvaluatedKey != "undefined");

    await fs.writeFileSync("exported_data.json", JSON.stringify(result,null, 4)); 
    console.info("Available count size:", result.length);
}
exportDB();

并运行node index.js

我希望这个对你有用

score 0 · Accepted Answer

DynamoDB 现在具有原生导出到 S3 功能（JSON 和 Amazon Ion 格式）https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon -s3/

score -1 · Accepted Answer

Dynamo DB 现在提供了一种向/从 S3 导出和导入数据的方法 http://aws.amazon.com/about-aws/whats-new/2014/03/06/announcing-dynamodb-cross-region-export-import /

score -1 · Accepted Answer

我使用了很棒的网络厨师网站... https://gchq.github.io/Cyber Chef

用csv to json工具。

score -1 · Accepted Answer

对于非常大的数据集，运行连续（和并行）扫描可能是耗时且脆弱的过程（想象它在中间死亡）。幸运的是，AWS 最近增加了将您的 DynamoDB 表数据直接导出到 S3的功能。这可能是实现您想要的最简单的方法，因为它不需要您编写任何代码并运行任何任务/脚本，因为它是完全托管的。

foreach record in file: documentClient.putItem完成后，您可以从 S3 下载它并使用类似逻辑或使用其他工具导入到本地 DynamoDB 实例。

score -1 · Accepted Answer

-1

如果需要，可以使用此 https://2json.net/dynamo将 Dynamo 数据转换为 JSON

于 2017-05-08T14:46:35.740 回答

score -1 · Accepted Answer

在一个类似的用例中，我使用 DynamoDB Streams 来触发 AWS Lambda，它基本上写入了我的 DW 实例。您可能会编写 Lambda 以将每个表更改写入您的非生产账户中的表。这样，您的 Devo 表也将非常接近 Prod。

score -4 · Accepted Answer

-4

在 DynamoDB Web 控制台中选择您的表，而不是 Actions -> Export/Import

于 2016-05-25T04:26:59.447 回答

amazon-web-services - 从 DynamoDB 导出数据

19 回答 19

Related

Reference