multithreading - 限制多线程上的对象分配

Question

我有一个应用程序，它检索和缓存客户端查询的结果，并将结果从缓存发送到客户端。

我对可以在任何时候缓存的项目数量进行了限制，并且跟踪此限制已大大降低了处理大量并发请求时的应用程序性能。有没有更好的方法来解决这个问题而不会经常锁定，这可能会提高性能？

编辑：我已经采用了 CAS 方法，它似乎工作得很好。

score 3 · Accepted Answer

首先，不要使用锁，而是使用原子递减和比较和交换来操作您的计数器。语法因您的编译器而异；在 GCC 中，您可能会执行以下操作：

long remaining_cache_slots;

void release() {
  __sync_add_and_fetch(&remaining_cache_slots, 1);
}

// Returns false if we've hit our cache limit
bool acquire() {
  long prev_value, new_value;
  do {
    prev_value = remaining_cache_slots;
    if (prev_value <= 0) return false;
    new_value = prev_value - 1;
  } while(!__sync_bool_compare_and_swap(&remaining_cache_slots, prev_value, new_value));
  return true;
}

这应该有助于减少争用的窗口。但是，您仍然会在整个地方弹跳该缓存行，这在高请求率下会严重损害您的性能。

如果您愿意接受一定量的浪费（即，允许缓存结果的数量 - 或者更确切地说，待处理的响应 - 略低于限制），您还有其他一些选择。一种是使缓存线程本地化（如果可能在您的设计中）。另一个是让每个线程保留一个“缓存令牌”池以供使用。

我保留缓存令牌池的意思是每个线程可以提前保留将 N 个条目插入缓存的权利。当该线程从缓存中删除一个条目时，它会将其添加到其令牌集中；如果它用完了令牌，它会尝试从全局池中获取它们，如果它有太多，它会放回一些。代码可能看起来有点像这样：

long global_cache_token_pool;
__thread long thread_local_token_pool = 0;

// Release 10 tokens to the global pool when we go over 20
// The maximum waste for this scheme is 20 * nthreads
#define THREAD_TOKEN_POOL_HIGHWATER 20
#define THREAD_TOKEN_POOL_RELEASECT 10

// If we run out, acquire 5 tokens from the global pool
#define THREAD_TOKEN_POOL_ACQUIRECT 5

void release() {
  thread_local_token_pool++;

  if (thread_local_token_pool > THREAD_TOKEN_POOL_HIGHWATER) {
    thread_local_token_pool -= THREAD_TOKEN_POOL_RELEASECT;
    __sync_fetch_and_add(&global_token_pool, THREAD_TOKEN_POOL_RELEASECT);
  }
}

bool acquire() {
  if (thread_local_token_pool > 0) {
    thread_local_token_pool--;
    return true;
  }

  long prev_val, new_val, acquired;
  do {
    prev_val = global_token_pool;
    acquired = std::min(THREAD_TOKEN_POOL_ACQUIRECT, prev_val);
    if (acquired <= 0) return false;

    new_val = prev_val - acquired;
  } while (!__sync_bool_compare_and_swap(&remaining_cache_slots, prev_value, new_value));

  thread_local_token_pool = acquired - 1;

  return true;
}

像这样对请求进行批处理会降低线程访问共享数据的频率，从而减少争用和缓存流失的数量。但是，如前所述，它会使您的限制不太精确，因此需要仔细调整以获得正确的平衡。

score 1 · Accepted Answer

在SendResults中，处理结果后仅尝试更新totalResultsCached一次。这将最大限度地减少获取/释放锁所花费的时间。

void SendResults( int resultsToSend, Request *request )
{
    for (int i=0; i<resultsToSend; ++i)
    {
        send(request.remove())
    }

    lock totalResultsCached 
    totalResultsCached -= resultsToSend;
    unlock totalResultsCached 
}

如果resultsToSend通常是 1，那么我的建议不会有太大的不同。

此外，在达到缓存限制后，可能会丢弃一些额外的请求ResultCallback，因为在发送每个请求后SendResults不会立即更新。totalResultsCached

multithreading - 限制多线程上的对象分配

2 回答 2

Related

Reference