python - python多线程使用fcntl flcok处理文件

Question

我尝试使用 python 来处理文本替换问题。有一个Little-endian UTF-16格式的文件，我想替换这个文件中的ip地址。首先，我逐行读取这个文件，然后替换目标字符串，最后，我将新字符串写入文件。但是多线程操作这个文件，文件就会乱码。这是我的代码。

import re
import codecs 
import time
import thread
import fcntl

ip = "10.200.0.1" 
searchText = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" 

def replaceFileText(fileName,searchText,replaceText,encoding):
    lines = []
    with codecs.open(fileName,"r",encoding) as file:
        fcntl.flock(file,fcntl.LOCK_EX)
        for line in file:
            lines.append(re.sub(searchText,replaceText,line))
        fcntl.flock(file,fcntl.LOCK_UN)

    with codecs.open(fileName,"w",encoding) as file:
        fcntl.flock(file,fcntl.LOCK_EX)
        for line in lines:
            file.write(line)
        fcntl.flock(file,fcntl.LOCK_UN)

def start():
    replaceFileText("rdpzhitong.rdp",searchText,ip,"utf-16-le")                                                                 
    thread.exit_thread()

def test(number):
    for n in range(number):
        thread.start_new_thread(start,())
        time.sleep(1)

test(20)

我不明白为什么文件是乱码，我已经使用 fcntl flock 来保持读/写顺序，问题出在哪里？

score 5 · Accepted Answer

这是乱码，因为 fcntl 锁是由进程拥有的，而不是由线程拥有的，因此进程不能使用 fcntl 序列化自己的访问。例如，请参阅此答案。

您将需要使用像Lock这样的线程结构。

score 0 · Accepted Answer

我想它是乱码，因为您打开它后将其锁定。在这种情况下，搜索位置可能是错误的。

顺便说一句，Python 中的线程在这种情况下并不是那么有用（四处寻找 python GIL 问题）。我建议您，为了最大限度地提高此类任务的性能，使用多处理模块并使用队列/管道更改逻辑，使分析数据的工作进程和负责输入和输出文件的 I/O 的主进程。

python - python多线程使用fcntl flcok处理文件

2 回答 2

Related

Reference