我正在尝试从非常小的文本(如 sms)中提取专有名词,如名称和组织名称中的专有名词,nltk 可用的基本解析器使用 NLTK WordNet 查找专有名词能够获取名词,但问题是当我们得到专有名词时不以大写字母开头,对于这样的文本,像 sumit 这样的名称不会被识别为专有名词
>>> sentence = "i spoke with sumit and rajesh and Samit about the gridlock situation last night @ around 8 pm last nite"
>>> tagged_sent = pos_tag(sentence.split())
>>> print tagged_sent
[('i', 'PRP'), ('spoke', 'VBP'), ('with', 'IN'), **('sumit', 'NN')**, ('and', 'CC'), ('rajesh', 'JJ'), ('and', 'CC'), **('Samit', 'NNP'),** ('about', 'IN'), ('the', 'DT'), ('gridlock', 'NN'), ('situation', 'NN'), ('last', 'JJ'), ('night', 'NN'), ('@', 'IN'), ('around', 'IN'), ('8', 'CD'), ('pm', 'NN'), ('last', 'JJ'), ('nite', 'NN')]