你能给我看一个简单的例子吗,使用http://www.nltk.org/code来确定一个关于快乐或不安情绪的字符串?
4 回答
Pattern也是值得一试的:你可以在项目主页上看到两个意见挖掘实验。
Nopey.
This is a task far beyond the capabilities of NLTK or any grammatical parser that is known or can be realistically imagined. Look at the NLTK Book to see what sorts of tasks it can accomplish which are far, far from your stated purpose.
As a cheap example:
I really enjoyed using your paper to train my dog.
Parse that up with NLTK and you can get
[('I', 'PRP'), ('really', 'RB'), ('enjoyed', 'VBD'),
('using', 'VBG'), ('your', 'PRP$'), ('paper', 'NN'),
('to', 'TO'), ('train', 'VB'), ('my', 'PRP$'), ('dog', 'NN')]
Where the parse tree would tell me that 'enjoyed' is the central (past-tense) verb of the simple sentence. To enjoy something is good. To train something is generally a good thing. Gerunds, nouns, comparatives, and such are relatively neutral. So give this a Good score of 0.90.
Except I really mean that I either hit my dog with your paper or let it excrete on the paper which you'd probably consider a not Good thing.
Hire a person for this recognition task.
Added for those who imagine that even trained classifiers are of much use:
Classify this real entry from a real customer review corpus using any classifier you like trained on any dataset you like:
This camera keeps on autofocussing in auto mode with a buzzing sound which can't be stopped. It would be really good if they have given an option to stop this autofocussing. If you want to have the date and time on the image, it's only through their software which reads the image's date and time from the image's meta-data. So if you use your card reader and copy images - you got to once again open them through their software to put the date and time. In that too, there isn't a direct way to add date and time - you got to say 'print images' to a different directory in which there is an option to specify the date and time . Even the slightest of the shakes totally distorts your image. Indoor images weren't so clear. You got to have flash 'on' to get it even though your room is well lit. The lens cap is a really annoying. the movie clips taken will always have some 'noise' in it - you can't avoid that.
The worst mood classification I obtained was "totally equivocal" yet humans can easily determine that this is anything but complimentary. This wasn't a randomly picked datum, rather one that was selected for negative bias without "hate" or "suxz" or similar.
您正在寻找一种使用机器学习分类器来确定一段文本是正面还是负面的技术。许多研究团队对此进行了各种不同的尝试(例如http://research.yahoo.com/pub/2387 和http://lingcog.iit.edu/doc/appraisal_sentiment_cikm.pdf),我们可以了解一下确定产品评论是正面还是负面的准确度为 80% 到 90%。
由于您的问题很简洁,对我来说,确定产品评论是正面还是负面是您要完成的同一任务,还是仅仅是相关任务,对我来说并不明显,但我建议从袋子开始简单-使用贝叶斯分类器(NLTK 应该能够处理)进行词的分类,然后根据准确度的结果从那里改进您的技术。
不幸的是,我从未使用过 NLTK(也没有使用过 Python),因此我无法为您提供如何使用 NLTK 的代码示例。