python - 将 GeoDataFrame 写入 SQL 数据库

Question

我希望我的问题不是荒谬的，因为令人惊讶的是，这个问题显然还没有在流行的网站上被问到（据我所知）。

情况是我有几个 csv 文件，总共包含超过 1 个 Mio 观察结果。除其他外，每个观察都包含一个邮政地址。我计划将所有文件读入一个 GeoDataFrame，对地址进行地理编码，在给定 shapefile 的情况下执行空间连接，并为每一行保存多边形中的一些信息。相当标准，我想。这是一次性数据清理过程的一部分。

我的目标是用这个最终数据集建立一个数据库。这是因为它允许我很容易地共享和搜索数据，以及在网站上绘制一些观察结果。此外，它可以很容易地根据一些标准选择观察结果，然后运行一些分析。

我的问题是，将 GeoDataFrame 插入数据库的功能似乎尚未实现 - 显然是因为 GeoPandas 应该是数据库的替代品（“GeoPandas 使您能够轻松地在 python 中进行操作，否则这些操作需要空间数据库，例如作为 PostGIS”）。

当然，我可以遍历每一行并“手动”插入每个数据点，但我在这里寻找最佳解决方案。对于任何解决方法，我也会担心数据类型可能与数据库的数据类型冲突。有没有“最好的方式”来这里？

谢谢你的帮助。

score 13 · Accepted Answer

如前所述，@Kartik 的答案仅适用于单个调用，对于附加数据，它会引发 a，DataError因为该geom列随后期望几何具有 SRID。您可以使用GeoAlchemy来处理所有情况：

# Imports
from geoalchemy2 import Geometry, WKTElement
from sqlalchemy import *

# Use GeoAlchemy's WKTElement to create a geom with SRID
def create_wkt_element(geom):
    return WKTElement(geom.wkt, srid = <your_SRID>)

geodataframe['geom'] = geodataframe['geom'].apply(create_wkt_element)

db_url = 'postgresql://username:password@host:socket/database'
engine = create_engine(db_url, echo=False)

# Use 'dtype' to specify column's type
# For the geom column, we will use GeoAlchemy's type 'Geometry'
your_geodataframe.to_sql(table_name, engine, if_exists='append', index=False, 
                         dtype={'geom': Geometry('POINT', srid= <your_srid>)})

score 7 · Accepted Answer

所以，我刚刚为 PostGIS 数据库实现了这个，我可以在这里粘贴我的方法。对于 MySQL，您必须调整代码。

第一步是将地理编码的列转换为 WKB 十六进制字符串，因为我使用SQLAlchemy和基于pyscopg的引擎，并且这两个包本身都不理解地理类型。下一步是像往常一样将该数据写入 SQL DB（请注意，所有几何列都应转换为包含 WKB 十六进制字符串的文本列），最后通过执行查询将列的类型更改为几何。参考以下伪代码：

# Imports
import sqlalchemy as sal
import geopandas as gpd

# Function to generate WKB hex
def wkb_hexer(line):
    return line.wkb_hex

# Convert `'geom'` column in GeoDataFrame `gdf` to hex
    # Note that following this step, the GeoDataFrame is just a regular DataFrame
    # because it does not have a geometry column anymore. Also note that
    # it is assumed the `'geom'` column is correctly datatyped.
gdf['geom'] = gdf['geom'].apply(wkb_hexer)

# Create SQL connection engine
engine = sal.create_engine('postgresql://username:password@host:socket/database')

# Connect to database using a context manager
with engine.connect() as conn, conn.begin():
    # Note use of regular Pandas `to_sql()` method.
    gdf.to_sql(table_name, con=conn, schema=schema_name,
               if_exists='append', index=False)
    # Convert the `'geom'` column back to Geometry datatype, from text
    sql = """ALTER TABLE schema_name.table_name
               ALTER COLUMN geom TYPE Geometry(LINESTRING, <SRID>)
                 USING ST_SetSRID(geom::Geometry, <SRID>)"""
    conn.execute(sql)

score 2 · Accepted Answer

Hamri Said 的答案的一个版本，但使用了 lambda，在我看来这更好一些，因为它是一个如此短的函数：

# Imports
from geoalchemy2 import Geometry, WKTElement
from sqlalchemy import *

geodataframe['geom'] = geodataframe['geom'].apply(lambda geom: WKTElement(geom.wkt, srid = <your_SRID>))

db_url = 'postgresql://username:password@host:socket/database'
engine = create_engine(db_url, echo=False)

# Use 'dtype' to specify column's type
# For the geom column, we will use GeoAlchemy's type 'Geometry'
your_geodataframe.to_sql(table_name, engine, if_exists='append', index=False, 
                         dtype={'geom': Geometry('POINT', srid= <your_srid>)})

score 0 · Accepted Answer

我回到这个来给出一个更好的答案。一个geopandas.GeoDataFrame对象有一个.to_postgis()方法来处理处理几何类型的许多麻烦事。

python - 将 GeoDataFrame 写入 SQL 数据库

4 回答 4

Related

Reference