python - 读取同时写入的文件时行数不准确

Question

我有一个文件，我知道它正好是 7168 行。在各种情况下，我得到虚假的行数。举个例子：

file = open("testfile", 'r')
count = 0
for line in file:
   count += 1
   print "count: " + str(count)

此代码导致：“计数：1098 ”

file = open("testfile", 'r')
count = 0
for line in file:
   count += 1
   print line  ### this line is the only difference
   print "count: " + str(count)

此代码导致：“计数：7168 ”

我唯一能想到的就是我在某个地方的内存不足。“testfile”的人口来自后台的一个 Popen。想法/希望是在用户到达脚本中需要完成转储的点之前，在后台将所有需要的数据转储到文件中。如果用户到达脚本中需要 testfile 内容的位置，但 Popen 尚未完成，我运行以下代码：

notified = False
while (os.path.getsize("testfile") == 0):
   if notified == False:
      print "Please hold, still dumping uids..."
      notified = True
print "done!"

怀疑os.path.getsize立即调用无数次可能是有害的，我修改了我的代码：

notified = False
while (os.path.getsize("testfile") == 0):
   if notified == False:
      print "Please hold, still dumping uids..."
      notified = True
   time.sleep(3)   ### Delay 3 seconds
print "done!"

在这种情况下，我的行数为6896（这要好得多，但仍然不是真正的计数）

进一步修改：

notified = False
while (os.path.getsize("testfile") == 0):
   if notified == False:
      print "Please hold, still dumping uids..."
      notified = True
   time.sleep(5)   ### Delay 5 seconds
print "done!"

现在我的行数按预期显示为7168 。

谁能向我解释发生了什么，以及如何以更高的效率实现我的目标？总体目标是，我的脚本需要在脚本稍后的某个时间点将大量数据转储到文件中。为了减少用户停机时间，我的 Popen 在脚本一开始就在后台运行。这while (os.path.getsize("testfile") == 0)条线是为了防止竞争条件。

score 3 · Accepted Answer

您无需等待后台任务完成。尝试while在打开之前用这个替换你的循环testfile：

pid.wait()

来自哪里pid的回报subprocess.Popen()。

作为替代方案，您可以一举创建文件。例如，您可以创建testfile.tmp然后mv testfile.tmp testfile在您的子流程中运行。

score 1 · Accepted Answer

您有一个进程正在写入文件，而另一个进程正在读取同一文件。在没有进程间同步的多处理系统上，您将获得竞争条件，因此计数低于预期。这与实现语言无关。

管道在进程间同步方面做得很好。命令：

$ producer | tee testfile | wc -l

wc将始终根据放入的行数产生准确的计数testfile。你让这个问题变得比它应该的更难。

python - 读取同时写入的文件时行数不准确

2 回答 2

Related

Reference