3

I'm trying to run through the TextBlob tutorial in Windows (using Git Bash shell) with Python 3.3.

I've installed textblob and nltk as well as any dependencies.

The Python code is:

from text.blob import TextBlob

wiki = TextBlob("Python is a high-level, general-purpose programming language.")
tags = wiki.tags

I'm getting the following error

Traceback (most recent call last):
File "textblob.py", line 4, in <module> 
  tags = wiki.tags
File "c:\Python33\lib\site-packages\text\decorators.py", line 18, in __get__ 
  value = obj.__dict__[self.func.__name__] = self.func(obj)
File "c:\Python33\lib\site-packages\text\blob.py", line 357, in pos_tags 
  for word, t in self.pos_tagger.tag(self.raw)
File "c:\Python33\lib\site-packages\text\taggers.py", line 40, in tag
  return pattern_tag(sentence, tokenize)
File "c:\Python33\lib\site-packages\text\en.py", line 115, in tag
  for sentence in parse(s, tokenize, True, False, False, False, encoding).split():
File "c:\Python33\lib\site-packages\text\en.py", line 99, in parse
  return parser.parse(unicode(s), *args, **kwargs)
File "c:\Python33\lib\site-packages\text\text.py", line 1213, in parse
  s[i] = self.find_tags(s[i], **kwargs)
File "c:\Python33\lib\site-packages\text\en.py", line 49, in find_tags
  return _Parser.find_tags(self, tokens, **kwargs)
File "c:\Python33\lib\site-packages\text\text.py", line 1161, in find_tags
  map = kwargs.get(     "map", None))
File "c:\Python33\lib\site-packages\text\text.py", line 967, in find_tags
  tagged.append([token, lexicon.get(token, i==0 and lexicon.get(token.lower()) or   None)])
File "c:\Python33\lib\site-packages\text\text.py", line 98, in get
  return self._lazy("get", *args)
File "c:\Python33\lib\site-packages\text\text.py", line 79, in _lazy
  self.load()
File "c:\Python33\lib\site-packages\text\text.py", line 367, in load
  dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if x.strip()))
File "c:\Python33\lib\site-packages\text\text.py", line 367, in <genexpr>
  dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if x.strip()))
File "c:\Python33\lib\site-packages\text\text.py", line 346, in _read
  for line in f:
File "c:\Python33\lib\encodings\cp1252.py", line 23, in decode
  return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 16: character maps to <undefined>

Any idea what is wrong here? Adding a 'u' before the string didn't help.

4

1 回答 1

3

0.7.1 版修复了这个问题,这意味着是时候

$ pip install -U textblob

问题是en-lexicon.txt用于词性标记的文件使用 Windows 的默认平台编码 cp1252 打开文件。该文件显然包含 Python 无法从该编码中解码的字符。这已通过以 utf-8 模式显式打开文件来解决。

于 2013-09-30T20:30:27.657 回答