我希望将分数(正面、负面或中性)应用于短文本短语。没有解析出表情符号并根据它们的用法做出假设,我不确定还有什么可以尝试的。任何人都可以提供对这个问题进行更词汇分析的示例、研究论文、文章等。
我在想诸如副词使用、标点符号误用/重复、拼写/语法错误之类的东西都可能是作者情绪的体面指标,几乎是二元意义上的(好或坏)。
This sounds like a pretty clear binary classification task, where you can simplify the issue to positive or negative, and then make the most entropic decisions or those that haven't reached a threshold of certainty by way of probability mass set to neutral.
Your biggest hurdle will be getting training data for a stochastic machine learning method. You could easily do this with a readily available maximum entropy model such as the Toolkit for Advanced Discriminative Modeling or Mallet. The features you described would just have to be formatted to the inputs these models use.
In order to get training data, you can either do some kind of paid crowdsourcing like Amazon's Mechanical Turk or just do it yourself, maybe with the help of a friend. You'll need a lot of data for this. You can improve the predictive strength of your model in light of a dearth of data with approaches like active learning, ensembling, or boosting, but it's important to test these against real-world data as best as you can and pick what works best in a practical application.
If you're looking for papers for this, you'll want to look at the term 'sentiment analysis' in Google Scholar. The Association for Computational Linguistics has a lot of free and useful papers from conferences and journals which address the problem from a linguistic as well as algorithmic standpoint. I'd also browse their archives. Good luck!
这听起来是一个非常有趣的想法——我很想看看它会产生什么。
我会说标点符号是您可以使用的一种指标...
您也可以尝试使用常见的首字母缩略词,例如...
这显然是你想做的一件相当复杂的事情,但听起来很有趣。