1

我需要找到网站上某些评论的意见。我为此使用 sentiwordnet。我首先将包含所有评论的文件发送到 POS Tagger。

tokens=nltk.word_tokenize(line) #tokenization for line in file
tagged=nltk.pos_tag(tokens) #for POSTagging
print tagged

除了将其视为 2 个单独的单词之外,是否还有其他准确的标记方法认为不如 1 个单词好。

现在我必须给标记化的单词打正分和负分,然后计算总分。sentiwordnet 中是否有任何功能。请帮忙。

4

1 回答 1

2

请参阅首先从评论中提取副词和形容词,例如:

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
import csv

para = "What can I say about this place. The staff of the restaurant is nice and the eggplant is not bad. Apart from that, very uninspired food, lack of atmosphere and too expensive. I am a staunch vegetarian and was sorely dissapointed with the veggie options on the menu. Will be the last time I visit, I recommend others to avoid"

sentense = word_tokenize(para)
word_features = []

for i,j in nltk.pos_tag(sentense):
    if j in ['JJ', 'JJR', 'JJS', 'RB', 'RBR', 'RBS']: 
        word_features.append(i)

rating = 0

for i in word_features:
    with open('words.txt', 'rt') as f:
        reader = csv.reader(f, delimiter=',')
        for row in reader:
            if i == row[0]:
                print i, row[1]
                if row[1] == 'pos':
                    rating = rating + 1
                elif row[1] == 'neg':
                    rating = rating - 1
print  rating

现在您必须有一个外部 csv 文件,其中应该有正面和负面的词

喜欢 : 皱纹,neg wrinkled,neg 皱纹,neg masterly,pos 杰作,pos 杰作,pos

上述脚本的工作如下:

1. 阅读第 2 句。提取副词和形容词 3.比较 CVS 的正面和负面词 4。然后给句子打分

上述脚本的结果是:

nice pos  
bad neg  
expensive neg  
sorely neg  
-2

根据您的需要更改结果。对不起我的英语:P

于 2016-03-01T12:07:02.747 回答