c++ - “最近[秒/分钟/小时]命中”数据结构的实现

Question

我认为这是一个相当普遍的问题，但我似乎无法通过谷歌搜索找到答案（也许我不知道的问题有一个更准确的名称？）

您需要使用用于报告命中的“hit()”方法和 hitsInLastSecond|Minute|Hour 方法来实现一个结构。你有一个说纳秒精度的计时器。你如何有效地实现这一点？

我的想法是这样的（在伪 C++ 中）

class HitCounter {
  void hit() {
    hits_at[now()] = ++last_count;
  }

  int hitsInLastSecond() {
    auto before_count = hits_at.lower_bound(now() - 1 * second)
    if (before_count == hits_at.end()) { return last_count; }
    return last_count - before_count->second;
  }

  // etc for Minute, Hour

  map<time_point, int> hits_at;
  int last_count = 0;
};

这行得通吗？好吗？有更好的东西吗？

更新：添加修剪并根据评论切换到双端队列：

class HitCounter {
  void hit() {
    hits.push_back(make_pair(now(), ++last_count));
  }

  int hitsInLastSecond() {
    auto before = lower_bound(hits.begin(), hits.end(), make_pair(now() - 1 * second, -1));
    if (before == hits.end()) { return last_count; }
    return last_count - before_count->second;
  }

  // etc for Minute, Hour

  void prune() {
    auto old = upper_bound(hits.begin(). hits.end(), make_pair(now - 1 * hour, -1));
    if (old != hits.end()) {
      hits.erase(hits.begin(), old)
    }
  }

  deqeue<pair<time_point, int>> hits;
  int last_count = 0;
};

score 2 · Accepted Answer

What you are describing is called a histogram.

Using a hash, if you intend nanosecond accuracy, will eat up much of your cpu. You probably want a ring buffer for storing the data.

Use std::chrono to achieve the timing precision you require, but frankly hits per second seems like the highest granularity you need and if you are looking at the overall big picture, it doesn't seem like it will matter terribly what the precision is.

This is a partial, introductory sample of how you might go about it:

#include <array>
#include <algorithm>

template<size_t RingSize>
class Histogram
{
    std::array<size_t, RingSize> m_ringBuffer;
    size_t m_total;
    size_t m_position;
public:
    Histogram() : m_total(0)
    {
        std::fill_n(m_ringBuffer.begin(), RingSize, 0);
    }

    void addHit()
    {
        ++m_ringBuffer[m_position];
        ++m_total;
    }

    void incrementPosition()
    {
        if (++m_position >= RingSize)
            m_position = 0;
        m_total -= m_ringBuffer[m_position];
        m_ringBuffer[m_position] = 0;
    }

    double runningAverage() const
    {
        return (double)m_total / (double)RingSize;
    }

    size_t runningTotal() const { return m_total; }
};

Histogram<60> secondsHisto;
Histogram<60> minutesHisto;
Histogram<24> hoursHisto;
Histogram<7> weeksHisto;

This is a naive implementation which assumes you will call it every second and increment the position, and will transpose runningTotal from one histogram to the next every RingSize (so every 60s, add secondsHisto.runningTotal to minutesHisto).

Hopefully it will be a useful introductory place to start from.

If you want to track a longer histogram of hits per second, you can do that with this model, by increasing the ring size, add a second total to track the last N ring buffer entries, so that m_subTotal = sum(m_ringBuffer[m_position - N .. m_position]), similar to the way m_total works.

size_t m_10sTotal;

...

void addHit()
{
    ++m_ringBuffer[m_position];
    ++m_total;
    ++m_10sTotal;
}

void incrementPosition()
{
    // subtract data from >10 sample intervals ago.
    m_10sTotal -= m_ringBuffer[(m_position + RingBufferSize - 10) % RingBufferSize];
    // for the naive total, do the subtraction after we
    // advance position, since it will coincide with the
    // location of the value RingBufferSize ago.
    if (++m_position >= RingBufferSize)
        m_position = 0;
    m_total -= m_ringBuffer[m_position];
}

You don't have to make the histo grams these sizes, this is simply a naive scraping model. There are various alternatives, such as incrementing each histogram at the same time:

secondsHisto.addHit();
minutesHisto.addHit();
hoursHisto.addHit();
weeksHisto.addHit();

Each rolls over independently, so all have current values. Size each histo as far as you want data at that granularity to go back.

c++ - “最近[秒/分钟/小时]命中”数据结构的实现

1 回答 1

Related

Reference