4

I have a shared tempfile resource that is divided into chunks of 4K (or some such value). Each 4K in the file is represented by an index starting from zero. For this shared resource, I track the 4K chunk indices in use and always return the lowest indexed 4K chunk not in use, or -1 if all are in use.

This ResourceSet class for the indices has a public acquire and release method, both of which use synchronized lock whose duration is about like that of generating 4 random numbers (expensive, cpu-wise).

Therefore as you can see from the code that follows, I use an AtomicInteger "counting semaphore" to prevent a large number of threads from entering the critical section at the same time on acquire(), returning -1 (not available right now) if there are too many threads.

Currently, I am using a constant of 100 for the tight CAS loop to try to increment the atomic integer in acquire, and a constant of 10 for the maximum number of threads to then allow into the critical section, which is long enough to create contention. My question is, what should these constants be for a moderate to highly loaded servlet engine that has several threads trying to get access to these 4K chunks?

public class ResourceSet {

    // ??? what should this be
    // maximum number of attempts to try to increment with CAS on acquire
    private static final int    CAS_MAX_ATTEMPTS = 50;

    // ??? what should this be
    // maximum number of threads contending for lock before returning -1 on acquire
    private static final int    CONTENTION_MAX = 10;

    private AtomicInteger        latch = new AtomicInteger(0);

    ... member variables to track free resources

    private boolean aquireLatchForAquire ()
    {
        for (int i = 0; i < CAS_MAX_ATTEMPTS; i++) {
            int val = latch.get();
            if (val == -1)
                throw new AssertionError("bug in ResourceSet");        // this means more threads than can exist on any system, so its a bug!
            if (!latch.compareAndSet(val, val+1))
                continue;
            if (val < 0 || val >= CONTENTION_MAX) {
                latch.decrementAndGet();
                // added to fix BUG that comment pointed out, thanks!
                return false;
            }
        }
        return false;
    }

    private void aquireLatchForRelease ()
    {
        do {
            int val = latch.get();
            if (val == -1)
                throw new AssertionError("bug in ResourceSet");    // this means more threads than can exist on any system, so its a bug!
            if (latch.compareAndSet(val, val+1))
                return;
        } while (true);
    }

    public ResourceSet (int totalResources)
    {
        ... initialize
    }

    public int acquire (ResourceTracker owned)
    {        
        if (!aquireLatchForAquire())
            return -1;

        try {
            synchronized (this) {
                ... algorithm to compute minimum free resoource or return -1 if all in use
                return resourceindex;
            }
        } finally {
            latch.decrementAndGet();
        }
    }

    public boolean release (ResourceIter iter)
    {
        aquireLatchForRelease();
        try {
            synchronized (this) {
                ... iterate and release all resources
            }
        } finally {
            latch.decrementAndGet();
        }
    }
}
4

3 回答 3

1

编写一个良好且高性能的自旋锁实际上非常复杂,需要对内存屏障有很好的理解。仅仅选择一个常数并不会削减它,而且绝对不会是可移植的。谷歌的 gperftools 有一个你可以看的例子,但可能比你需要的要复杂得多。

如果您真的想减少对锁的争用,您可能需要考虑使用更细粒度和更乐观的方案。一个简单的方法是将您的块分成 n 个组,并为每个组关联一个锁(也称为剥离)。这将有助于减少争用并提高吞吐量,但无助于减少延迟。您还可以将 AtomicBoolean 关联到每个块和 CAS 以获取它(在失败的情况下重试)。在处理无锁算法时要小心,因为它们往往很难正确处理。如果你做对了,它可以大大减少获取块的延迟。

请注意,如果不知道您的块选择算法是什么样的,就很难提出更细粒度的方法。我还假设您确实存在性能问题(已对其进行了分析以及所有内容)。

当我这样做时,您的自旋锁实现存在缺陷。您永远不应该直接在 CAS 上旋转,因为您正在向内存屏障发送垃圾邮件。对于任何严重的争用(与雷鸣般的羊群问题有关) ,这将非常缓慢。最低要求是在您的 CAS 之前首先检查变量的可用性(如果没有障碍读取就很简单)。更好的是不要让所有线程都以相同的值旋转。这应该可以避免相关的缓存线在您的内核之间进行乒乓操作。

请注意,我不知道 Java 中的原子操作与哪种类型的内存屏障相关联,因此我的上述建议可能不是最佳或正确的。

最后,多处理器编程的艺术是一本有趣的书,可以更好地了解我在这个答案中吐出的所有废话。

于 2012-05-11T01:59:09.310 回答
0

You can use Semaphore's tryAcquire method if you want your threads to balk on no resource available.

I for one would simply substitute your synchronized keyword with a ReentrantLock and use the tryLock() method on it. If you want to let your threads wait a bit, you can use tryLock(timeout) on the same class. Which one to choose and what value to use for timeout, needs to be determined by way of a performance test.

Creating an explicit gate seems as you seem to be doing seems unnecessary to me. I'm not saying that it can never help, but IMO it's more likely to actually hurt performance, and it's an added complication for sure. So unless you have an performance issue around here (based on a test you did) and you found that this kind of gating helps, I'd recommend to go with the simplest implementation.

于 2012-05-11T02:09:32.527 回答
0

我不确定是否有必要为此场景打造自己的 Lock 类。由于 JDK 提供了 ReentrantLock,它在获取锁时也利用了 CAS 指令。与您的个人 Lock 类相比,性能应该相当不错。

于 2012-05-11T00:15:50.980 回答