1

大多数 Web 应用程序都有一个位置字段,用户可以在其中输入他们选择的位置。

您将如何根据输入的位置将用户分类到不同的国家。

例如,我使用了 Stack Overflow 转储users.xml并提取了用户的姓名、声誉和位置:

['Jeff Atwood', '12853', 'El Cerrito, CA']
['Jarrod Dixon', '1114', 'Morganton, NC']
['Sneakers OToole', '200', 'Unknown']
['Greg Hurlman', '5327', 'Halfway between the boardwalk and Six Flags, NJ']
['Power-coder', '812', 'Burlington, Ontario, Canada']
['Chris Jester-Young', '16509', 'Durham, NC']
['Teifion', '7024', 'Wales']
['Grant', '3333', 'Georgia']
['TimM', '133', 'Alabama']
['Leon Bambrick', '2450', 'Australia']
['Coincoin', '3801', 'Montreal']
['Tom Grochowicz', '125', 'NJ']
['Rex M', '12822', 'US']
['Dillie-O', '7109', 'Prescott, AZ']
['Pete', '653', 'Reynoldsburg, OH']
['Nick Berardi', '9762', 'Phoenixville, PA']
['Kandis', '39', '']
['Shawn', '4248', 'philadelphia']
['Yaakov Ellis', '3651', 'Israel']
['redwards', '21', 'US']
['Dave Ward', '4831', 'Atlanta']
['Liron Yahdav', '527', 'San Rafael, CA']
['Geoff Dalgas', '648', 'Corvallis, OR']
['Kevin Dente', '1619', 'Oakland, CA']
['Tom', '3316', '']
['denny', '573', 'Winchester, VA']
['Karl Seguin', '4195', 'Ottawa']
['Bob', '4652', 'US']
['saniul', '2352', 'London, UK']
['saint_groceon', '1087', 'Houston, TX']
['Tim Boland', '192', 'Cincinnati Ohio']
['Darren Kopp', '5807', 'Woods Cross, UT']

使用以下 Python 脚本:

from xml.etree import ElementTree

root = ElementTree.parse('SO Export/so-export-2009-05/users.xml').getroot()
items = ['DisplayName','Reputation','Location']

def loop1():
    for count,i in enumerate(root):
    det = [i.get(x) for x in items]
    print det
    if count>30: break

loop1()

将人们分类到不同国家的最简单方法是什么?是否有任何现成的查找表可以为我提供X位置属于Y国家/地区的输出?

查找表不需要完全准确。通过在 Google 或 Wolfram Alpha 上查询位置字符串可以获得相当准确的答案。

4

2 回答 2

2

You best bet is to use a Geocoding API like geopy (some Examples).

The Google Geocoding API, for example, will return the country in the CountryNameCode-field of the response.

With just this one location field the number of false matches will probably be relatively high, but maybe it is good enough.

If you had server logs, you could try to also look up the users IP address with an IP geocoder (more information and pointers on Wikipedia

于 2009-08-14T21:33:06.717 回答
1

Force users to specify country, because you'll have to deal with ambiguities. This would be the right way.

If that's not possible, at least make your best-guess in conjunction with their IP address.

For example, ['Grant', '3333', 'Georgia']

Is this Georgia, USA? Or is this the Republic of Georgia?

If their IP address suggests somewhere in Central Asia or Eastern Europe, then chances are it's the Republic of Georgia. If it's North America, chances are pretty good they mean Georgia, USA.

Note that mappings for IP address to country isn't 100% accurate, and the database needs to be updated regularly. In my opinion, far too much trouble.

于 2009-08-14T21:24:03.570 回答