根据另一个答案中的评论探索这个案例,我决定分享我的 Bash 脚本和我的想法。
导出多个表
要从特定模式导出许多表,我使用以下脚本。
#!/bin/bash
source ./pgconfig
export PGPASSWORD=$password
# if you want filter, set the tables names into FILTER variable below and removing the character # to uncomment that.
# FILTER=("table_name_a" "table_name_b" "table_name_c")
#
# Set the output directory
OUTPUT_DATA="/tmp/pgsql2shp/$database"
#
#
# Remove the Shapefiles after ZIP
RM_SHP="yes"
# Define where pgsql2shp is and format the base command
PG_BIN="/usr/bin"
PG_CON="-d $database -U $user -h $host -p $port"
# creating output directory to put files
mkdir -p "$OUTPUT_DATA"
SQL_TABLES="select table_name from information_schema.tables where table_schema = '$schema'"
SQL_TABLES="$SQL_TABLES and table_type = 'BASE TABLE' and table_name != 'spatial_ref_sys';"
TABLES=($($PG_BIN/psql $PG_CON -t -c "$SQL_TABLES"))
export_shp(){
SQL="$1"
TB="$2"
pgsql2shp -f "$OUTPUT_DATA/$TB" -h $host -p $port -u $user $database "$SQL"
zip -j "$OUTPUT_DATA/$TB.zip" "$OUTPUT_DATA/$TB.shp" "$OUTPUT_DATA/$TB.shx" "$OUTPUT_DATA/$TB.prj" "$OUTPUT_DATA/$TB.dbf" "$OUTPUT_DATA/$TB.cpg"
}
for TABLE in ${TABLES[@]}
do
DATA_QUERY="SELECT * FROM $schema.$TABLE"
SHP_NAME="$TABLE"
if [[ ${#FILTER[@]} -gt 0 ]]; then
echo "Has filter by table name"
if [[ " ${FILTER[@]} " =~ " ${TABLE} " ]]; then
export_shp "$DATA_QUERY" "$SHP_NAME"
fi
else
export_shp "$DATA_QUERY" "$SHP_NAME"
fi
# remove intermediate files
if [[ "$RM_SHP" = "yes" ]]; then
rm -f $OUTPUT_DATA/$SHP_NAME.{shp,shx,prj,dbf,cpg}
fi
done
将数据拆分为多个文件
为了避免pgsql2shp不向shapefile写入数据时出现大表的问题,我们可以采用分页策略进行数据拆分。在 Postgres 中,我们可以使用 LIMIT、OFFSET 和 ORDER BY 进行分页。
应用此方法并考虑到您的表具有用于对我的示例脚本中的数据进行排序的主键。
#!/bin/bash
source ./pgconfig
export PGPASSWORD=$password
# if you want filter, set the tables names into FILTER variable below and removing the character # to uncomment that.
# FILTER=("table_name_a" "table_name_b" "table_name_c")
#
# Set the output directory
OUTPUT_DATA="/tmp/pgsql2shp/$database"
#
#
# Remove the Shapefiles after ZIP
RM_SHP="yes"
# Define where pgsql2shp is and format the base command
PG_BIN="/usr/bin"
PG_CON="-d $database -U $user -h $host -p $port"
# creating output directory to put files
mkdir -p "$OUTPUT_DATA"
SQL_TABLES="select table_name from information_schema.tables where table_schema = '$schema'"
SQL_TABLES="$SQL_TABLES and table_type = 'BASE TABLE' and table_name != 'spatial_ref_sys';"
TABLES=($($PG_BIN/psql $PG_CON -t -c "$SQL_TABLES"))
export_shp(){
SQL="$1"
TB="$2"
pgsql2shp -f "$OUTPUT_DATA/$TB" -h $host -p $port -u $user $database "$SQL"
zip -j "$OUTPUT_DATA/$TB.zip" "$OUTPUT_DATA/$TB.shp" "$OUTPUT_DATA/$TB.shx" "$OUTPUT_DATA/$TB.prj" "$OUTPUT_DATA/$TB.dbf" "$OUTPUT_DATA/$TB.cpg"
}
for TABLE in ${TABLES[@]}
do
GET_PK="SELECT a.attname "
GET_PK="${GET_PK}FROM pg_index i "
GET_PK="${GET_PK}JOIN pg_attribute a ON a.attrelid = i.indrelid AND a.attnum = ANY(i.indkey) "
GET_PK="${GET_PK}WHERE i.indrelid = 'test'::regclass AND i.indisprimary"
PK=($($PG_BIN/psql $PG_CON -t -c "$GET_PK"))
MAX_ROWS=($($PG_BIN/psql $PG_CON -t -c "SELECT COUNT(*) FROM $schema.$TABLE"))
LIMIT=10000
OFFSET=0
# base query
DATA_QUERY="SELECT * FROM $schema.$TABLE"
# continue until all data are fetched.
while [ $OFFSET -le $MAX_ROWS ]
do
DATA_QUERY_P="$DATA_QUERY ORDER BY $PK OFFSET $OFFSET LIMIT $LIMIT"
OFFSET=$(( OFFSET+LIMIT ))
SHP_NAME="${TABLE}_${OFFSET}"
if [[ ${#FILTER[@]} -gt 0 ]]; then
echo "Has filter by table name"
if [[ " ${FILTER[@]} " =~ " ${TABLE} " ]]; then
export_shp "$DATA_QUERY_P" "$SHP_NAME"
fi
else
export_shp "$DATA_QUERY_P" "$SHP_NAME"
fi
# remove intermediate files
if [[ "$RM_SHP" = "yes" ]]; then
rm -f $OUTPUT_DATA/$SHP_NAME.{shp,shx,prj,dbf,cpg}
fi
done
done
通用配置文件
两个示例中使用的 PostgreSQL 连接配置文件 (pgconfig):
user="postgres"
host="my_ip_or_hostname"
port="5432"
database="my_database"
schema="my_schema"
password="secret"
另一种策略是选择GeoPackage作为输出文件,它支持比 shapefile 格式更大的文件大小,保持跨操作系统的可移植性并在 GIS 软件中有足够的支持。
ogr2ogr -f GPKG output_file.gpkg PG:"host=my_ip_or_hostname user=postgres dbname=my_database password=secret" -sql "SELECT * FROM my_schema.my_table"
参考: