3

在我的应用程序中,我有一个线程在日志行上执行非常快速的处理以产生一个浮点值。通常只有一个其他线程每隔一段时间对值执行慢速读取。每隔一段时间,其他线程就会来来去去,并对这些值执行一次性读取。

我的问题是关于互斥体(在 cpython 中)的必要性,对于这种数据只是可用的最新数据的特定情况。它不是必须与其他任何东西(甚至是同时写入的其他字段)同步的临界值。只是简单地......它的价值是什么。

话虽如此,我知道我可以轻松地添加一个锁(或读取器/写入锁)来保护值的更新,但我想知道整个日志过程中获取/释放的开销是否快速连续(假设平均 5000 行)只是“适当地”做共享资源是不值得的。

基于文档上什么样的全局值突变是线程安全的?,这些赋值应该是原子操作。

这是逻辑的基本示例:

import time
from random import random, choice, randint
from threading import Thread 

class DataStructure(object):
    def __init__(self):
        self.f_val = 0.0
        self.s_val = ""

def slow_reader(data):
    """ 
    Loop much more slowly and read values 
    anywhere between 1 - 5 second intervals
    """
    for _ in xrange(10):

        f_val = data.f_val 
        # don't care about sync here
        s_val = data.s_val

        print f_val, s_val

        # in real code could be even 30 or 60 seconds
        time.sleep(randint(1,3))

def fast_writer(data):
    """ Update data extremely often """
    for _ in xrange(20000):
        f_val, s_val = do_work()

        data.f_val = f_val
        # don't care about sync here
        data.s_val = s_val 


FLOAT_SRC = [random()*100 for _ in xrange(100)]
STR_SRC = ['foo', 'bar', 'biz', 'baz']

def do_work():
    time.sleep(0.001)
    return choice(FLOAT_SRC), choice(STR_SRC)


if __name__ == "__main__":

    data = DataStructure()

    threads = [
        Thread(target=slow_reader, args=(data,)),
        Thread(target=fast_writer, args=(data,)),
    ]

    for t in threads:
        t.daemon=True
        t.start()

    for t in threads:
        t.join()

这表示快速日志解析器(实际上是通过 PIPE 读取)在每一行上工作,而缓慢的周期性读取器抓取当时的当前值。在任何时候,另一个一次性读取线程可能会来来去去从数据结构中获取相同的值。

这是完全不需要 cpython 中的互斥锁的情况吗?

Edit

To clarify a bit more... I don't even need the float and string fields to be in sync from the last write. It is ok if the scheduler decides to switch contexts between the float and string reads. I'm just wondering if I even need the overhead of a lock to simply read whatever value is assigned at any moment in time.

My concern is regarding the fact that the writer is going to be looping, on an extremely fast operating, locking and unlocking a lock that is often un-contended.

Effectively assume this is all I care about in the reader:

def slow_reader(data):
    for _ in xrange(10):
        f_val = data.f_val 
        print f_val
        time.sleep(randint(1,3))
4

2 回答 2

2

You need a mutex when doing concurrent accesses:

  • on composite values, and one of these accesses must modify the value in multiple spots atomically;
  • on simple values, and at least two of these accesses are writing.

In your example, the value is composite (2 fields), and the modification operates on multiple spots (these 2 fields), so you should put a mutex to ensure that the reader doesn't get scheduled in between the two modifications.

EDIT: If the reader doesn't care about the fields being in sync, then you don't need a mutex.

于 2012-11-06T19:30:54.833 回答
2

You should lock the container in reader when acquiring a single existing item, but if the item itself is not modified by anything any more and will not get moved, you ca release the mutex as soon as reader has the item.

If item may be modified, you can either get a quick copy and release mutex, or have separate mutex for the individual item, so rest of the container can bw worked on by others. Your case sounds like you do not need to worry about this though.

If you have many readers which should pick oldest unprocessed item, then you need a queue (which might be as simple as index of latest taken item) and a separate mutex for it. This might even be an atomic integer, so you could avoid needing a mutex altogether for the "queue".

Actually, with suitable atomic integer arrangement and polling, you could avoid mutexes completely. An atomic integer index for latest complete item, increased by writer and only read by polling readers. Second atomic integer index for latest taken item, increased by readers, which then start waiting for that index to be ready (if it is not ready yet).

(Readers polling can be avoided by some notify mechanism, but these require a mutex lock or socket, both pretty expensive).

于 2012-11-06T19:48:32.023 回答