5

给定一个包含 N 个元素的数组,我正在寻找 M (M < N) 个长度相等或长度相差大多为 1 的连续子数组。例如,如果 N = 12 且 M = 4,则所有子数组都会N/M = 3 的长度相等。如果 N = 100 且 M = 12,我希望子数组的长度为 8 和 9,并且这两种大小都应均匀分布在原始数组中。这个简单的任务变得有点难以实现。我想出了对Bresenham 的线算法的改编,用 C++ 编码时看起来像这样:

/// The function suggests how an array with num_data-items can be
/// subdivided into successively arranged groups (intervals) with
/// equal or "similar" length. The number of intervals is specified
/// by the parameter num_intervals. The result is stored into an array
/// with (num_data + 1) items, each of which indicates the start-index of
/// an interval, the last additional index being a sentinel item which 
/// contains the value num_data.
///
/// Example:
///
///    Input:  num_data ........... 14,
///            num_intervals ...... 4
///
///    Result: result_start_idx ... [ 0, 3, 7, 10, 14 ]
///

void create_uniform_intervals( const size_t         num_data,
                               const size_t         num_intervals,
                               std::vector<size_t>& result_start_idx )
{
    const size_t avg_interval_len  = num_data / num_intervals;
    const size_t last_interval_len = num_data % num_intervals;

    // establish the new size of the result vector
    result_start_idx.resize( num_intervals + 1L );
    // write the pivot value at the end:
    result_start_idx[ num_intervals ] = num_data;

    size_t offset     = 0L; // current offset

    // use Bresenham's line algorithm to distribute
    // last_interval_len over num_intervals:
    intptr_t error = num_intervals / 2;

    for( size_t i = 0L; i < num_intervals; i++ )
    {
        result_start_idx[ i ] = offset;
        offset += avg_interval_len;
        error -= last_interval_len;
        if( error < 0 )
        {
            offset++;
            error += num_intervals;
        } // if
    } // for
}

此代码计算 N = 100,M=12 的区间长度: 8 9 8 8 9 8 8 9 8 8 9 8

实际的问题是我不知道如何准确地调用我的问题,所以我很难找到它。

  • 还有其他算法可以完成这样的任务吗?
  • 他们怎么称呼?如果我知道其他应用领域,也许名字会出现。

我需要该算法作为更大的数据聚类算法的一部分。我认为它对于实现并行排序(?)也很有用。

4

2 回答 2

7

If your language has integer division that truncates, an easy way to compute the size of section i is via (N*i+N)/M - (N*i)/M. For example, the python program

  N=100;M=12
  for i in range(M): print (N*i+N)/M - (N*i)/M

outputs the numbers 8 8 9 8 8 9 8 8 9 8 8 9. With N=12;M=5 it outputs 2 2 3 2 3. With N=12;M=3 it outputs 4 4 4.

If your section numbers are 1-based rather than 0-based, the expression is instead (N*i)/M - (N*i-N)/M.

于 2011-11-10T18:24:39.560 回答
0

Space-filling-curves and fractals subdivide the plane and reduce the complexity. There is for example z-curve, hilbert curve, morton curve.

于 2011-11-10T18:25:30.130 回答