file - Python：每行读取一个文本文件的一个单词

Question

它不是正确的代码，但我想知道是否有一种方法可以使用 .split() 仅搜索一个单词 w./o，因为它形成了一个列表，而我不希望使用此代码段：

f=(i for i in fin.xreadlines())
for i in f:
    try:
        match=re.search(r"([A-Z]+\b) | ([A-Z\'w]+\b) | (\b[A-Z]+\b) | (\b[A-Z\'w]+\b) | (.\w+\b)", i) # | r"[A-Z\'w]+\b" | r"\b[A-Z]+\b" | r"\b[A-Z\'w]+\b" | r".\w+\b"

我也可以像这样制作一个可重用的类模块

class LineReader: #Intended only to be used with for loop
    def __init__(self,filename):
        self.fin=open(filename,'r')
    def __getitem__(self,index):
        line=self.fin.xreadline()
        return line.split()

在哪里说 f = LineReader（文件路径）

和 f 中的 i。getitem (index=line number 25) 循环从那里开始？我不知道该怎么做。任何提示？

score 1 · Accepted Answer

要获取一行的第一个单词：

line[:max(line.find(' '), 0) or None]

line.find(' ')搜索第一个空格，并返回它。如果没有找到空格，则返回 -1

max( ... ), 0)确保结果始终大于 0，并使 -1 为 0。这很有用，因为 bool(-1) 为 True 而 bool(0) 为 False。

x or None如果 x != 0 则计算为 x 否则无

并且 finalyline[:None]等于line[:]，它返回一个与line

第一个样本：

with open('file') as f:
    for line in f:
        word = line[:max(line.find(' '), 0) or None]
        if condition(word):
            do_something(word)

和类（这里作为生成器实现）

def words(stream):
    for line in stream:
        yield line[:max(line.find(' '), 0) or None]

你可以像这样使用

gen = words(f)
for word in gen:
    if condition(word):
        print word

或者

gen = words(f)
while 1:
    try:
        word = gen.next()
        if condition(word):
            print word
    except StopIteration:
        break # we reached the end

但是您还想从某个行号开始读取。如果您不知道线条的长度，这将不会非常有效。唯一的方法是读取行并丢弃它们，直到达到正确的行号。

def words(stream, start=-1): # you could replace the -1 with 0 and remove the +1
    for i in range(start+1): # it depend on whether you start counting with 0 or 1
        try:
            stream.next()
        except StopIteration:
            break
    for line in stream:
        yield line[:max(line.find(' '), 0) or None]

请注意，如果一行以空格开头，您可能会得到奇怪的结果。为防止这种情况，您可以line = line.rstrip()在循环的开头插入。

免责声明：此代码均未经过测试

file - Python：每行读取一个文本文件的一个单词

1 回答 1

Related

Reference