在我的应用程序中,我有一个线程在日志行上执行非常快速的处理以产生一个浮点值。通常只有一个其他线程每隔一段时间对值执行慢速读取。每隔一段时间,其他线程就会来来去去,并对这些值执行一次性读取。
我的问题是关于互斥体(在 cpython 中)的必要性,对于这种数据只是可用的最新数据的特定情况。它不是必须与其他任何东西(甚至是同时写入的其他字段)同步的临界值。只是简单地......它的价值是什么。
话虽如此,我知道我可以轻松地添加一个锁(或读取器/写入锁)来保护值的更新,但我想知道整个日志过程中获取/释放的开销是否快速连续(假设平均 5000 行)只是“适当地”做共享资源是不值得的。
基于文档上什么样的全局值突变是线程安全的?,这些赋值应该是原子操作。
这是逻辑的基本示例:
import time
from random import random, choice, randint
from threading import Thread
class DataStructure(object):
def __init__(self):
self.f_val = 0.0
self.s_val = ""
def slow_reader(data):
"""
Loop much more slowly and read values
anywhere between 1 - 5 second intervals
"""
for _ in xrange(10):
f_val = data.f_val
# don't care about sync here
s_val = data.s_val
print f_val, s_val
# in real code could be even 30 or 60 seconds
time.sleep(randint(1,3))
def fast_writer(data):
""" Update data extremely often """
for _ in xrange(20000):
f_val, s_val = do_work()
data.f_val = f_val
# don't care about sync here
data.s_val = s_val
FLOAT_SRC = [random()*100 for _ in xrange(100)]
STR_SRC = ['foo', 'bar', 'biz', 'baz']
def do_work():
time.sleep(0.001)
return choice(FLOAT_SRC), choice(STR_SRC)
if __name__ == "__main__":
data = DataStructure()
threads = [
Thread(target=slow_reader, args=(data,)),
Thread(target=fast_writer, args=(data,)),
]
for t in threads:
t.daemon=True
t.start()
for t in threads:
t.join()
这表示快速日志解析器(实际上是通过 PIPE 读取)在每一行上工作,而缓慢的周期性读取器抓取当时的当前值。在任何时候,另一个一次性读取线程可能会来来去去从数据结构中获取相同的值。
这是完全不需要 cpython 中的互斥锁的情况吗?
Edit
To clarify a bit more... I don't even need the float and string fields to be in sync from the last write. It is ok if the scheduler decides to switch contexts between the float and string reads. I'm just wondering if I even need the overhead of a lock to simply read whatever value is assigned at any moment in time.
My concern is regarding the fact that the writer is going to be looping, on an extremely fast operating, locking and unlocking a lock that is often un-contended.
Effectively assume this is all I care about in the reader
:
def slow_reader(data):
for _ in xrange(10):
f_val = data.f_val
print f_val
time.sleep(randint(1,3))