python - python中的string.find()不能处理特殊字符

Question

我认为错误出在读取功能中。它无法读取超出图像中的特殊字符的内容请参阅 repr 输出

我在 python 中使用 string.find() 如下：

indexOfClosedDoc = temp.find("</DOC>",indexOfOpenDoc)

但是，当字符串具有如下文本时：

SUB
</DOC>

其中 SUB 是一个特殊字符， temp.find 找不到标签。对于如何解决这个问题，有任何的建议吗

例子：

在此处输入图像描述

导致它失败的代码：

handle = open("error.txt",'r');
temp = handle.read();
index = temp.find("</DOC>",0)
if(index == -1):
    print "Error"
    exit(1)

将图像文本放入文本文件并运行代码

这是示例中文本的 temp 变量的 repr。eror.txt 中的文本是图像中第 29722 行的所有内容

' </P>\n\n'

注意： read() 函数永远不会读取超出 SUB 所以查找是不可能的

score 2 · Accepted Answer

答案是使用“rb”模式打开文件。在 Windows 上，仅使用 'r' 打开文件将导致它使用在 0x1A（DOS EOF 字符）处停止的旧 DOS 行为。另请参阅0x1A 上的线路读取扼流圈

score 0 · Accepted Answer

注意：如果文件使用多字节编码，那么.find()即使其中没有，也不会工作0x1A，例如：

import codecs

with codecs.open('file.utf16', 'w', encoding='utf-16') as file:
    file.write(u"abcd") # write a string using utf-16 encoding

#XXX incorrect code don't use it
with open('file.utf16', 'r') as f:
    temp = f.read()
    i = temp.find('bc')
    print i #XXX -> -1 not found

with open('file.utf16', 'rb') as f:
    temp = f.read()
    i = temp.find('bc')
    print i #XXX -> -1 not found

# works
with codecs.open('file.utf16', encoding='utf-16') as f:
    temp = f.read()
    i = temp.find('bc')
    print i # -> 1 found

score -1 · Accepted Answer

-1

检查您的indexOfOpenDoc 值，我怀疑它大于显示的位置。

于 2012-08-23T04:19:32.410 回答

python - python中的string.find()不能处理特殊字符

3 回答 3

Related

Reference