2

我有一个非常大的数据集,我试图找到满足所有数据集的最小集。最终集合中必须有一个值,该值存在于所有数据集中

一小部分数据看起来像

[0] => Array
    (
        [0] => 21
        [1] => 21
        [2] => 21
    )

[1] => Array
    (
        [0] => 29
    )

[2] => Array
    (
        [0] => 27
    )

[3] => Array
    (
        [0] => 21
        [1] => 21
        [2] => 21
        [3] => 39
        [4] => 39
        [5] => 43
    )

[4] => Array
    (
        [0] => 29
        [1] => 33
        [2] => 33
        [3] => 43
    )

在这种情况下,我需要返回 21、27 和 29 的逻辑。返回的值必须是匹配所有数组的最小值。因为我是一名 PHP 程序员,所以我用 PHP 编写了这个函数。

4

1 回答 1

2

你可以遵循这个算法:

测试后更新

$data=array(
            array(21,29,27,57,22),
            array(22,21,23,24,25,26),
            array(31)
            );

$map = array(); // keep a map of values and how many times they occur in other sets
foreach ($data as $setid => $set) {
    foreach (array_unique($set) as $v) {
        $map[$v][$setid] = true;
    }
}

function reverseCount(array $a, array $b)
{
    return count($b) - count($a);
}

// sort reversed on intersection count
uasort($map, 'reverseCount');

// after sorting the first number will be the one that occurs the most
// keep track of which sets have been hit
$setmap = array(); $n = count($data);
foreach ($map as $v => $sets) {
    $hits = 0;
    foreach ($sets as $setid => $dummy) {
        if (!isset($setmap[$setid])) {
            --$n;
            ++$hits;
            $setmap[$setid] = true;
        } else {
            continue;
        }
    }
    if ($hits) {
        echo "value $v\n";
        if (!$n) {
            // all sets are hit
            break;
        }
    }
}

这次测试了。不能证明总能得到正确的结果,因为这被认为是一种贪心逼近算法。

但我希望它能让您了解您可以做什么。让我知道是否有任何事情让您感到困惑,或者我对此是否错了:)

于 2012-05-19T14:37:46.197 回答