c# - 高效滚动最大和最小窗口

Question

我想有效地计算滚动最大值和最小值。这意味着比每次窗口移动时从所有正在使用的值重新计算最大值/最小值更好。

这里有一篇帖子问了同样的问题，有人发布了一个解决方案，其中涉及某种堆栈方法，该方法应该根据其评级起作用。但是，我这辈子都找不到了。

任何帮助将不胜感激找到解决方案或帖子。谢谢你们！

score 7 · Accepted Answer

您要使用的算法称为升序最小值 （C++ 实现）。

要在 C# 中执行此操作，您将需要一个双端队列类，并且在 NuGet 上存在一个名为Nito.Deque的好的类。

我已经使用 Nito.Deque 编写了一个快速的 C# 实现，但我只是简单地检查了它，并且是从我的脑海中完成的，所以它可能是错误的！

public static class AscendingMinima
{
    private struct MinimaValue
    {
        public int RemoveIndex { get; set; }
        public double Value { get; set; }
    }

    public static double[] GetMin(this double[] input, int window)
    {
        var queue = new Deque<MinimaValue>();
        var result = new double[input.Length];

        for (int i = 0; i < input.Length; i++)
        {
            var val = input[i];

            // Note: in Nito.Deque, queue[0] is the front
            while (queue.Count > 0 && i >= queue[0].RemoveIndex)
                queue.RemoveFromFront();

            while (queue.Count > 0 && queue[queue.Count - 1].Value >= val)
                queue.RemoveFromBack();

            queue.AddToBack(new MinimaValue{RemoveIndex = i + window, Value = val });

            result[i] = queue[0].Value;
        }

        return result;
    }
}

score 3 · Accepted Answer

这是一种更有效的方法。您仍然需要偶尔计算该值，但除了某些退化数据（不断减小的值）之外，在此解决方案中已将其最小化。

我们将自己限制在最大限度以简化事情，但也很容易扩展到最低限度。

您只需要以下内容：

窗口本身，最初是空的。
当前最大值 ( max)，最初为任何值。
当前最大值 ( maxcount) 的计数，最初为零。

这个想法是使用max和maxcount作为保存当前最大值的缓存。缓存有效的地方，只需要返回里面的值，非常快的常量时间操作。

如果在请求最大值时缓存无效，它会填充缓存然后返回该值。这比上一段中的方法慢，但是一旦缓存再次有效，后续对最大值的请求将使用该更快的方法。

以下是您为维护窗口和相关数据所做的工作：

获取下一个值N。
如果窗口已满，则删除最早的条目M。如果 maxcount 大于 0 且M等于max，则递减maxcount。一旦maxcount达到 0，缓存是无效的，但我们不需要担心，直到用户请求最大值（在那之前重新填充缓存没有意义）。
添加N到滚动窗口。
如果窗口大小现在为 1（即N当前唯一的条目），则设置max为N和maxcount1，然后返回步骤 1。
如果maxcount大于 0 且N大于max，则设置max为N和maxcount为 1，然后返回步骤 1。
如果maxcount大于 0 且N等于max，则递增maxcount。
回到步骤 1。

现在，在进行窗口管理的任何时候，您都可以请求最大值。这是一个单独的操作，与窗口管理本身不同。这可以按顺序使用以下规则来完成。

如果窗口为空，则没有最大值：引发异常或返回一些合理的哨兵值。
如果maxcount大于 0，则缓存有效：只需 return max。
否则，需要重新填充缓存。浏览整个列表，max按照maxcount下面的代码片段进行设置。

set max to window[0], maxcount to 0
for each x in window[]:
    if x > max:
        set max to x, maxcount to 1
    else:
        if x == max:
            increment maxcount

您主要维护最大值的缓存并且仅在需要时重新计算这一事实使得这比在添加条目时简单地重新计算更有效。

对于一些明确的统计数据，我创建了以下 Python 程序。它使用大小为 25 的滑动窗口并使用从 0 到 999 的随机数（您可以使用这些属性来查看它们如何影响结果）。

首先是一些初始化代码。注意stat变量，它们将用于计算缓存命中和未命中：

import random

window = []
max = 0
maxcount = 0
maxwin = 25

statCache = 0
statNonCache = 0

然后根据我上面的描述向窗口添加数字的功能：

def addNum(n):
    global window
    global max
    global maxcount
    if len(window) == maxwin:
        m = window[0]
        window = window[1:]
        if maxcount > 0 and m == max:
            maxcount = maxcount - 1

    window.append(n)

    if len(window) == 1:
        max = n
        maxcount = 1
        return

    if maxcount > 0 and n > max:
        max = n
        maxcount = 1
        return

    if maxcount > 0 and n == max:
        maxcount = maxcount + 1

接下来，从窗口返回最大值的代码：

def getMax():
    global max
    global maxcount
    global statCache
    global statNonCache

    if len(window) == 0:
        return None

    if maxcount > 0:
        statCache = statCache + 1
        return max

    max = window[0]
    maxcount = 0
    for val in window:
        if val > max:
            max = val
            maxcount = 1
        else:
            if val == max:
                maxcount = maxcount + 1
    statNonCache = statNonCache + 1

    return max

最后，测试工具：

random.seed()
for i in range(1000000):
    val = int(1000 * random.random())
    addNum(val)
    newmax = getMax()

print("%d cached, %d non-cached"%(statCache,statNonCache))

请注意，每次向窗口添加数字时，测试工具都会尝试获取最大值。在实践中，这可能是不需要的。换句话说，这是生成的随机数据的最坏情况。

出于伪统计目的运行该程序几次，我们得到（为报告目的进行格式化和分析）：

 960579 cached,  39421 non-cached
 960373 cached,  39627 non-cached
 960395 cached,  39605 non-cached
 960348 cached,  39652 non-cached
 960441 cached,  39559 non-cached
 960602 cached,  39398 non-cached
 960561 cached,  39439 non-cached
 960463 cached,  39537 non-cached
 960409 cached,  39591 non-cached
 960798 cached,  39202 non-cached
=======         ======
9604969         395031

所以你可以看到，对于随机数据，平均只有大约 3.95% 的情况会导致计算命中（缓存未命中）。绝大多数使用缓存的值。这应该比每次插入窗口时都必须重新计算最大值要好得多。

会影响该百分比的一些因素是：

窗口大小。更大的大小意味着缓存命中的可能性更大，从而提高了百分比。例如，将窗口大小加倍几乎可以将缓存未命中率减半（达到 1.95%）。
可能值的范围。此处选择较少意味着窗口中更有可能发生缓存命中。例如，将范围从减少0..999到0..9减少缓存未命中（0.85%）有很大的改进。

score 0 · Accepted Answer

我假设您所说的“窗口”是指一个a[start]到的范围a[start + len]，并且start沿着移动。考虑最小值，最大值相似，并且移动到窗口a[start + 1]到a[start + len + 1]。那么窗口的最小值只有在（a）a[start + len + 1] < min（一个较小的值进来）或（b）a[start] == min（刚刚剩下的最小值之一；重新计算最小值）时才会改变。

另一种可能更有效的方法是用第一个窗口填充优先级队列，并使用每个进入/离开的值进行更新，但我认为这并没有好得多（优先级队列不适合“挑选从中间取出随机元素”（推进窗口时需要做的事情）。而且代码会复杂得多。最好坚持简单的解决方案，直到证明性能不可接受，并且该代码负责用于（大部分）资源消耗。

score 0 · Accepted Answer

在昨天写了我自己的算法并要求改进之后，我被推荐到这里。事实上，这个算法更优雅。我不确定它是否提供恒定速度计算，无论窗口大小如何，但无论如何，我测试了性能与我自己的缓存算法（相当简单，并且可能使用与其他人提出的相同想法）。缓存速度提高了 8-15 倍（使用 5,50,300,1000 的滚动窗口进行测试，我不需要更多）。以下是秒表和结果验证的替代方案。

static class Program
{
    static Random r = new Random();
    static int Window = 50; //(small to facilitate visual functional test). eventually could be 100 1000, but not more than 5000.
    const int FullDataSize =1000;
    static double[] InputArr = new double[FullDataSize]; //array prefilled with the random input data.

    //====================== Caching algo variables
    static double Low = 0;
    static int LowLocation = 0;
    static int CurrentLocation = 0;
    static double[] Result1 = new double[FullDataSize]; //contains the caching mimimum result
    static int i1; //incrementor, just to store the result back to the array. In real life, the result is not even stored back to array.

    //====================== Ascending Minima algo variables
    static double[] Result2 = new double[FullDataSize]; //contains ascending miminum result.
    static double[] RollWinArray = new double[Window]; //array for the caching algo
    static Deque<MinimaValue> RollWinDeque = new Deque<MinimaValue>(); //Niro.Deque nuget.
    static int i2; //used by the struct of the Deque (not just for result storage)


    //====================================== my initialy proposed caching algo
    static void CalcCachingMin(double currentNum)
    {
        RollWinArray[CurrentLocation] = currentNum;
        if (currentNum <= Low)
        {
            LowLocation = CurrentLocation;
            Low = currentNum;
        }
        else if (CurrentLocation == LowLocation)
            ReFindHighest();

        CurrentLocation++;
        if (CurrentLocation == Window) CurrentLocation = 0; //this is faster
        //CurrentLocation = CurrentLocation % Window; //this is slower, still over 10 fold faster than ascending minima

        Result1[i1++] = Low;
    }

    //full iteration run each time lowest is overwritten.
    static void ReFindHighest()
    {
        Low = RollWinArray[0];
        LowLocation = 0; //bug fix. missing from initial version.
        for (int i = 1; i < Window; i++)
            if (RollWinArray[i] < Low)
            {
                Low = RollWinArray[i];
                LowLocation = i;
            }
    }

    //======================================= Ascending Minima algo based on http://stackoverflow.com/a/14823809/2381899 
    private struct MinimaValue
    {
        public int RemoveIndex { get; set; }
        public double Value { get; set; }
    }

    public static void CalcAscendingMinima (double newNum)
    { //same algo as the extension method below, but used on external arrays, and fed with 1 data point at a time like in the projected real time app.
            while (RollWinDeque.Count > 0 && i2 >= RollWinDeque[0].RemoveIndex)
                RollWinDeque.RemoveFromFront();
            while (RollWinDeque.Count > 0 && RollWinDeque[RollWinDeque.Count - 1].Value >= newNum)
                RollWinDeque.RemoveFromBack();
            RollWinDeque.AddToBack(new MinimaValue { RemoveIndex = i2 + Window, Value = newNum });
            Result2[i2++] = RollWinDeque[0].Value;
    }

    public static double[] GetMin(this double[] input, int window)
    {   //this is the initial method extesion for ascending mimima 
        //taken from http://stackoverflow.com/a/14823809/2381899
        var queue = new Deque<MinimaValue>();
        var result = new double[input.Length];

        for (int i = 0; i < input.Length; i++)
        {
            var val = input[i];

            // Note: in Nito.Deque, queue[0] is the front
            while (queue.Count > 0 && i >= queue[0].RemoveIndex)
                queue.RemoveFromFront();

            while (queue.Count > 0 && queue[queue.Count - 1].Value >= val)
                queue.RemoveFromBack();

            queue.AddToBack(new MinimaValue { RemoveIndex = i + window, Value = val });

            result[i] = queue[0].Value;
        }

        return result;
    }

    //============================================ Test program.
    static void Main(string[] args)
    { //this it the test program. 
        //it runs several attempts of both algos on the same data.
        for (int j = 0; j < 10; j++)
        {
            Low = 12000;
            for (int i = 0; i < Window; i++)
                RollWinArray[i] = 10000000;
            //Fill the data + functional test - generate 100 numbers and check them in as you go:
            InputArr[0] = 12000;
            for (int i = 1; i < FullDataSize; i++) //fill the Input array with random data.
                //InputArr[i] = r.Next(100) + 11000;//simple data.
                InputArr[i] = InputArr[i - 1] + r.NextDouble() - 0.5; //brownian motion data.

            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();
            for (int i = 0; i < FullDataSize; i++) //run the Caching algo.
                CalcCachingMin(InputArr[i]);

            stopwatch.Stop();
            Console.WriteLine("Caching  : " + stopwatch.ElapsedTicks + " mS: " + stopwatch.ElapsedMilliseconds);
            stopwatch.Reset();


            stopwatch.Start();
            for (int i = 0; i < FullDataSize; i++) //run the Ascending Minima algo
                CalcAscendingMinima(InputArr[i]);

            stopwatch.Stop();
            Console.WriteLine("AscMimima: " + stopwatch.ElapsedTicks + " mS: " + stopwatch.ElapsedMilliseconds);
            stopwatch.Reset();

            i1 = 0; i2 = 0; RollWinDeque.Clear();
        }

        for (int i = 0; i < FullDataSize; i++) //test the results.
            if (Result2[i] != Result1[i]) //this is a test that algos are valid. Errors (mismatches) are printed.
                Console.WriteLine("Current:" + InputArr[i].ToString("#.00") + "\tLowest of " + Window + "last is " + Result1[i].ToString("#.00") + " " + Result2[i].ToString("#.00") + "\t" + (Result1[i] == Result2[i])); //for validation purposes only.

        Console.ReadLine();
    }


}

score 0 · Accepted Answer

我建议维护一个支持getMin()或的堆栈getMax()。

这可以通过两个堆栈来完成，并且只花费恒定的时间。

仅供参考：https ://www.geeksforgeeks.org/design-a-stack-that-supports-getmin-in-o1-time-and-o1-extra-space/

c# - 高效滚动最大和最小窗口

5 回答 5

Related

Reference