python - 为什么我在逐行导入文本文件以进行情感分析而不是使用硬编码的句子时收到 TypeError？

Question

我正在尝试逐行分析文本文件中每个给定句子的情绪。每当我使用链接的第一个问题中的硬编码句子时，该代码就可以工作。当我使用文本文件输入时，我得到TypeError.

这与此处提出的问题有关。文本文件代码的逐行来自这个问题：

第一个有效，第二个与文本文件("I love you. I hate him. You are nice. He is dumb")无效。这是代码：

from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
results = []    
with open("c:/nlp/test.txt","r") as f:
    for line in f.read().split('\n'):
        print("Line:" + line)
        res = nlp.annotate(line,
                   properties={
                       'annotators': 'sentiment',
                       'outputFormat': 'json',
                       'timeout': 1000,
                   })
        results.append(res)      

for res in results:             
    s = res["sentences"]         
    print("%d: '%s': %s %s" % (
        s["index"], 
        " ".join([t["word"] for t in s["tokens"]]),
        s["sentimentValue"], s["sentiment"]))

我收到此错误：

第 21 行，在

s["索引"],

TypeError：列表索引必须是整数或切片，而不是 str

score 0 · Accepted Answer

我没有安装Stanfort-lib，所以我无法用它的系统进行测试。但是，它返回的方式让我知道您的结果变量是“字典列表”类型或某种嵌套类型

反正我做了一个测试

results = []    

with open("tester.txt","r") as f:
    for line in f.read().split('\n'):
        print("Line:" + line)
        sentences = [
        {
            "index":1,
            "word":line,
            "sentimentValue": "sentVal",
            "sentiment":"senti"
        }
    ]
    results.append(sentences)

然后我构建了你的循环并对其进行了一些调整以满足我的需要，例如：

for res in results:         
    for s in res:         
        print("%d: '%s': %s %s" % (
            s["index"], 
            " ".join(s["word"]),
            s["sentimentValue"], s["sentiment"]))

是什么打印了我以下

1: 'I   l o v e   y o u .': sentVal senti
1: 'I   h a t e   h i m .': sentVal senti
1: 'Y o u   a r e   n i c e .': sentVal senti
1: 'H e   i s   d u m b': sentVal senti

所以基本上代码有效。但是你必须弄清楚返回值是什么类型，例如从那个 Stanfort API -> "type(results)" 返回之后

当你有这个信息时，你可以从一个遍历值的循环开始，如果你不知道嵌套值是什么类型，你可以调用 anotehr print of type。一直往下走，直到到达包含要使用的项目的层

最后要指出的一件事。在您链接的描述中，在注释中。他在那里告知如何将文本传递到 API。他在那里解释说 API 摆脱了切片和格式化，你应该只发送整个文本。如果您没有得到任何结果，请记住这一点

score 0 · Accepted Answer

看起来我解决了这个问题。正如 londo 指出的那样：这一行设置S为List，但应该是dict，就像在原始代码中一样：

s = res["sentences"]

我将代码移动到同一个循环中，在该循环中逐行读取和分析文件，然后直接在此处打印结果。所以新代码如下所示：

from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')

with open("c:/nlp/test.txt","r") as f:
    for line in f.read().split('\n'):
        res = nlp.annotate(line,
                    properties={
                        'annotators': 'sentiment',
                        'outputFormat': 'json',
                        'timeout': 15000,
                   }) 
        for s in res["sentences"]:
            print("%d: '%s': %s %s" % (
            s["index"], 
            " ".join([t["word"] for t in s["tokens"]]),
            s["sentimentValue"], s["sentiment"]))

结果看起来和预期的一样，没有任何错误消息：

0: 'I love you .': 3 Positive
0: 'I hate him .': 1 Negative
0: 'You are nice .': 3 Positive
0: 'He is dumb .': 1 Negative

python - 为什么我在逐行导入文本文件以进行情感分析而不是使用硬编码的句子时收到 TypeError？

2 回答 2

Related

Reference