python-3.x - 用于提取文本中位置的 geograpy3 库，给出 UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 276

Question

我正在尝试使用 python 中的 geography3 库从文本中提取位置。

import geograpy
address = 'Jersey City New Jersey 07306'
places = geograpy.get_place_context(text = address)

我得到以下错误UnicodeDecodeError：

 ~\Anaconda\lib\site-packages\geograpy\places.py in populate_db(self)
 28         with open(cur_dir + "/data/GeoLite2-City-Locations.csv") as info:
 29             reader = csv.reader(info)
---> 30             for row in reader:
 31                 print(row)
 32                 cur.execute("INSERT INTO cities VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?);", row)

~\Anaconda\lib\encodings\cp1252.py in decode(self, input, final)
 21 class IncrementalDecoder(codecs.IncrementalDecoder):
 22     def decode(self, input, final=False):
---> 23         return 
 codecs.charmap_decode(input,self.errors,decoding_table)[0]
 24 
 25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 276: character maps to <undefined>

经过一番调查，我尝试修改places.py文件并在该行添加encoding =“utf-8” -----> 30

with open(cur_dir + "/data/GeoLite2-City-Locations.csv", encoding="utf-8") as info:

但它仍然给我同样的错误。我还尝试将 GeoLite2-City-Locations.csv 保存在我的桌面上，然后尝试使用相同的代码读取它。

with open("GeoLite2-City-Locations.csv", encoding="utf-8") as info:
      reader = csv.reader(info)
      for row in reader:
          print(row)

它工作得很好，并打印了 GeoLite2-City-Locations.csv 的所有行。我无法理解这个问题！

score 1 · Accepted Answer

作为 geograpy3 的提交者来重现您的问题，我向最新的 geograpy3 https://github.com/somnathrakshit/geograpy3/blob/master/tests/test_extractor.py添加了一个测试：

结果：

['Jersey', 'City'

所以你可以简单地切换到最新版本。

def testStackoverflow54077973(self):
        '''
        see https://stackoverflow.com/questions/54077973/geograpy3-library-for-extracting-the-locations-in-the-text-gives-unicodedecodee
        '''
        address = 'Jersey City New Jersey 07306'
        e=Extractor(text=address)
        e.find_entities()
        self.check(e.places,['Jersey','City'])

score 0 · Accepted Answer

经过一番调查，在某些情况下，这是 Windows 与 Linux 的错误。即使使用

with open(cur_dir + "/data/GeoLite2-City-Locations.csv", encoding="utf-8") as info:

我无法解决我的 Windows 计算机上的错误。但是，完全相同的代码在我使用的 Linux 计算机上运行良好。我在 Linux 上查看了City-Locations.csv文件，发现 LibreOffice 自动编码和/或解析了所有字符。在 Excel 中查看同一个文件时，我仍然会有导致错误的所有时髦字符。Excel 出于某种原因坚持保留奇数字符。

score 0 · Accepted Answer

你应该像你一样指定编码encoding='utf-8'，虽然correct_country_mispelling(self, s)在places.py（49行）的方法中

python-3.x - 用于提取文本中位置的 geograpy3 库，给出 UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 276

3 回答 3

Related

Reference