-2

我在这里使用了 jupyter 笔记本。

此代码来自 youtube 视频。它在 youtuber 的计算机上工作,但我的提出了 Stopiteration 错误

在这里,我试图获取与“Go”语言相关的所有标题(来自 csv 的问题)

import pandas as pd

df = pd.read_csv("Questions.csv", encoding = "ISO-8859-1", usecols = ["Title", "Id"])

titles = [_ for _ in df.loc[lambda d: d['Title'].str.lower().str.contains(" go "," golang ")]['Title']]

#新单元格

import spacy

nlp = spacy.load("en_core_web_sm" , disable= ["ner"])

#新单元格

def has_golang(text):
    doc = nlp(text)
    for t in doc:    
        if t.lower_ in [' go ', 'golang']:
            if t.pos_ != 'VERB':
                if t.dep_ == 'pobj':
                    return True
    return False

g = (title for title in titles if has_golang(title))
[next(g) for i in range(10)]

#这是错误

StopIteration                             Traceback (most recent call last)
<ipython-input-56-862339d10dde> in <module>
      9 
     10 g = (title for title in titles if has_golang(title))
---> 11 [next(g) for i in range(10)]

<ipython-input-56-862339d10dde> in <listcomp>(.0)
      9 
     10 g = (title for title in titles if has_golang(title))
---> 11 [next(g) for i in range(10)]

StopIteration: 

据我所做的研究,我认为这可能是一个错误

我想做的就是获得满足 3 个“如果”条件的标题

链接到 youtube 视频

4

1 回答 1

1

StopIteration是调用next()耗尽迭代器的结果,即g产生少于 10 个结果。您可以从help()函数中获取此信息。

help(next)
Help on built-in function next in module builtins:
next(...)
    next(iterator[, default])
    
    Return the next item from the iterator. If default is given and the iterator
    is exhausted, it is returned instead of raising StopIteration.

编辑

has_golang的不正确。第一个测试总是False因为nlp标记单词,即修剪前导和尾随空格。尝试这个:

def has_golang(text):
    doc = nlp(text)
    for t in doc:    
        if t.lower_ in ['go', 'golang']:
            if t.pos_ != 'VERB':
                if t.dep_ == 'pobj':
                    return True
    return False

我通过找到一个应该导致Truefrom的标题来解决这个问题has_golang。然后我运行以下代码:

doc = nlp("Making a Simple FileServer with Go and Localhost Refused to Connect")
print("\n".join(str((t.lower_, t.pos_, t.dep_)) for t in doc))
('制作','动词','csubj')
('a', 'DET', 'det')
('简单','PROPN','复合')
('文件服务器','PROPN','dobj')
('with', 'ADP', 'prep')
('去','PROPN','pobj')
('和','CCONJ','cc')
('localhost','PROPN','conj')
('拒绝','动词','根')
(“到”、“部分”、“辅助”)
('连接','动词','xcomp')

然后看('go', 'PROPN', 'pobj'),很明显PROPN不是动词,pobj是pobj,所以问题必须出在token上:go,特别是"go"not " go "


原始回复

如果您只想要满足 3 个if条件的标题,请跳过生成器:

g = list(filter(has_golang, titles))

如果您需要生成器但也想要一个列表:

g = (title for title in titles if has_golang(title))
list(g)
于 2021-03-24T20:55:59.177 回答