0

我在熊猫数据框中有一系列城市名称。为此,我需要找出特定城市的地址并将它们存储在同一数据框中的单独列中。City 列也包含 NaN 值。我正在分别获取给定位置/城市名称的地址。但它不适用于熊猫数据框

data = [['madurai',10],['NaN',12],['hosur',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
from geopy.geocoders import Nominatim
geolocator = Nominatim()
for i in df.Name:
    if i == "NaN":
       continue
    loc = geolocator.geocode(i)
address = loc.address
print(address)

它适用于数据框,但仅返回最后一个地址,而不是整个 3 个城市。如果我们像下面这样更改顺序,

data = [['Nan',10],['Madurai',12],['hosur',13]]
df = pd.DataFrame(data,columns=['Name','Age'])

我收到错误:GeocoderTimedOut:服务超时

查询: 1.我希望将结果(地址)保存在列中 2.如何处理 Nan 值

4

3 回答 3

0

您只能获得最后一个值,因为您loc每次都在循环中不断替换。出现GeocoderTimedOut: Service timed out错误是因为您向服务器发出了许多请求。您应该sleep在请求之间包含一个。如果您仍然收到此错误,请查看以下内容:链接 - 避免超时

尝试:

import pandas as pd
from geopy.geocoders import Nominatim
import time

data = [['madurai',10],['NaN',12],['hosur',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
geolocator = Nominatim(user_agent='test')
address = []
for i in df.Name:
    time.sleep(3)
    if i == "NaN":
       address.append('NaN')
       continue    
    address.append(geolocator.geocode(i))

df['address'] = address
于 2019-03-17T12:21:51.780 回答
0

我引入了如下请求之间的时间延迟和几行来查看进度条

from geopy.geocoders import Nominatim
geolocator = Nominatim()
from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
final['Geolocation'] = final['city'].apply(geocode)
from tqdm import tqdm
tqdm.pandas()
final['Geolocation'] = final['city'].progress_apply(geocode)

现在可以了。

于 2019-03-18T08:19:05.147 回答
0

您可以通过以下方式添加包含地址的列:

import pandas as pd
data = [['madurai',10],['NaN',12],['hosur',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
from geopy.geocoders import Nominatim
geolocator = Nominatim()
for i in df.Name:
    if i == "NaN":
        continue
    df.loc[df.Name == i, 'Address'] = geolocator.geocode(i)

print(df)
于 2019-03-17T10:02:24.537 回答