python - Python：扫描文件中的子字符串，保存位置，然后返回它

Question

我正在编写一个脚本，该脚本需要扫描文件，直到找到出现子字符串的行，保存该行开头的位置，然后稍后返回。我对python很陌生，所以我还没有取得太大的成功。这是我当前的代码：

with open("test.txt") as f:
pos = 0
line = f.readline()
while line:
    if "That is not dead" in line:
        pos = f.tell() - len(line.encode('utf-8'))
        # pos = f.tell()

    line = f.readline()

f.seek(pos)
str = f.readline()
print(str)

使用 test.txt：

That is not dead
Which can eternal lie
Till through strange aeons
Even Death may die

Sphinx of black quartz, judge my vow!

这是输出：

hat is not dead

[newline character]

我意识到我的原件pos = f.tell()给了我行尾而不是开头的位置，我发现这个答案详细说明了如何获取字符串的字节长度，但是使用它会切断第一个字符。使用 utf-16 或 utf-16-le 分别给出ValueError: negative seek position -18or ValueError: negative seek position -16。我尝试使用此答案中的解决方案，使用以下代码：

with open("ctest.txt") as f:
pos = 0
line = f.readline()
while line:
    if "That is not dead" in line:
        print(line)
        f.seek(-len(line), 1)
        zz = f.readline()
        print(zz)
    line = f.readline()

f.seek(pos)
str = f.readline()
print(str)

这给出io.UnsupportedOperation: can't do nonzero cur-relative seeks了f.seek(-len(line), 1)

有人可以指出我要去哪里错了吗？

score 1 · Accepted Answer

Stefan Papp 建议在读取该行之前保存位置，这是一个我没有考虑过的简单解决方案。调整后的版本：

with open("test.txt") as f:
pos = 0
tempPos = 0
line = f.readline()
while line:
    if "That is not" in line:
        pos = tempPos
        
    tempPos = f.tell()
    line = f.readline()

f.seek(pos)
str = f.readline()
print(str)

使用正确的输出：

That is not dead
[newline character]

谢谢，斯特凡。我想我对我的问题太深入了，无法清楚地思考它。如果有比我所做的更好的方法来遍历文件，我很想知道，但这似乎有效。

python - Python：扫描文件中的子字符串，保存位置，然后返回它

1 回答 1

Related

Reference