python - 加快 pandas iterrows（xy 到 lat long 坐标 pyproj）

Question

我一直在使用 iterrows 使用 pyProj 模块将 XY 坐标转换为 Lat、Long。我知道在 pandas 中使用 iterrows 很慢，但我很难找到另一种编码方式。

我有一个带有井名和每个井 X 和 Y 坐标的数据框。我还有一个带有 ESPG 坐标系的列，可以由 pyProj 读取。这个 EPSG 坐标系统对于许多不同的井是不同的。我提供了一个示例数据框。

data = pd.DataFrame({"WellName": ("well1","well2","well3","well4","well5"),"EPSG": ('epsg:21898','epsg:21898','epsg:21897','epsg:21897','epsg:21897'),'X':(900011,900011,900011,900011,900011),'Y':(800011,800011,800011,800011,800011)})
data

我遍历该数据帧的每一行，找到 epsg 坐标系，然后将 x,y 转换为 lat,long。这有效，但速度极慢。有没有更简单更优雅的解决方案可以加快速度？

import pandas as pd
import numpy as np
from pyproj import Proj, transform


for index, row in data.iterrows():
        # epsg coord system (from EPSG row)
        inProj = Proj(init=row['EPSG'])
        # espg coord system for lat long
        outProj = Proj(init='epsg:4326')
        # X and Y coords (from X and Y rows)
        x1,y1 = row['X'],row['Y']#output
        x2,y2 = transform(inProj,outProj,x1,y1)
        #print (x2,y2)
        # create and fill in lat and long columns
        data.loc[index,'latitude'] = x2
        data.loc[index,'longitude'] = y2
        #print (row['name'],row['X'],(row['EPSG']))

我曾尝试将其矢量化，但我不知道我在做什么，它使我的 python 崩溃。我不建议使用它...：/

data['latitude'],data['longitude'] = transform(Proj(init=(data['EPSG'])), Proj(init='epsg:4326'), data['X'], data['Y'])

中途解决方案：

经过更多尝试，我部分解决了我的问题。现在速度快了几个数量级，使用“应用”

它使用 lat,long 创建一个新的元组列。然后，我必须执行一些解决方案来为元组创建两个单独的列（一个用于 lat，一个用于 long）。

    data['LatLong'] = data.apply(lambda row:  transform(Proj(init=row['EPSG']),Proj(init='epsg:4326'),row['X'],row['Y']), axis=1)

LatLongIndex = pd.DataFrame(data['LatLong'].values.tolist(), index=data.index)
dfDevLatLong = pd.merge(dataSDX,LatLongIndex, right_index=True, left_index=True)
dfDevLatLong

它现在是可行的，但仍然有点慢，我相信有一种更优雅的方法可以解决这个问题。

score 0 · Accepted Answer

我已经部分解决了我的问题。现在速度快了几个数量级，使用“应用”

它使用 lat,long 创建一个新的元组列。然后，我必须执行一些解决方案来为元组创建两个单独的列（一个用于 lat，一个用于 long）。

    data['LatLong'] = data.apply(lambda row:  transform(Proj(init=row['EPSG']),Proj(init='epsg:4326'),row['X'],row['Y']), axis=1)

LatLongIndex = pd.DataFrame(data['LatLong'].values.tolist(), index=data.index)
dfDevLatLong = pd.merge(dataSDX,LatLongIndex, right_index=True, left_index=True)
dfDevLatLong

它现在是可行的，但仍然有点慢，我相信有一种更优雅的方法可以解决这个问题。

python - 加快 pandas iterrows（xy 到 lat long 坐标 pyproj）

1 回答 1

Related

Reference