11

我有一个List<CustomPoint> points;包含近百万个对象。从这个列表中,我想获得恰好出现两次的对象列表。最快的方法是什么?我也会对非 Linq 选项感兴趣,因为我可能也必须在 C++ 中执行此操作。

public class CustomPoint
{
    public double X { get; set; }
    public double Y { get; set; }

    public CustomPoint(double x, double y)
    {
        this.X = x;
        this.Y = y;
    }
}

public class PointComparer : IEqualityComparer<CustomPoint>
{
    public bool Equals(CustomPoint x, CustomPoint y)
    {
        return ((x.X == y.X) && (y.Y == x.Y));
    }

    public int GetHashCode(CustomPoint obj)
    {
        int hash = 0;
        hash ^= obj.X.GetHashCode();
        hash ^= obj.Y.GetHashCode();
        return hash;
    }
}

基于这个答案,我试过了,

list.GroupBy(x => x).Where(x => x.Count() = 2).Select(x => x.Key).ToList(); 

但这在新列表中给出了零个对象。有人可以指导我吗?

4

3 回答 3

9

您应该在类本身而不是在 PointComparer 中实现 Equals 和 GetHashCode

于 2012-12-20T12:02:00.673 回答
4

要使您的代码正常工作,您需要将您的实例PointComparer作为第二个参数传递给GroupBy.

于 2012-12-20T12:01:40.283 回答
3

这种方法对我有用:

public class PointCount
{
    public CustomPoint Point { get; set; }
    public int Count { get; set; }
}

private static IEnumerable<CustomPoint> GetPointsByCount(Dictionary<int, PointCount> pointcount, int count)
{
    return pointcount
                    .Where(p => p.Value.Count == count)
                    .Select(p => p.Value.Point);
}

private static Dictionary<int, PointCount> GetPointCount(List<CustomPoint> pointList)
{
    var allPoints = new Dictionary<int, PointCount>();

    foreach (var point in pointList)
    {
        int hash = point.GetHashCode();

        if (allPoints.ContainsKey(hash))
        {
            allPoints[hash].Count++;
        }
        else
        {
            allPoints.Add(hash, new PointCount { Point = point, Count = 1 });
        }
    }

    return allPoints;
}

像这样调用:

static void Main(string[] args)
{
    List<CustomPoint> list1 = CreateCustomPointList();

    var doubles = GetPointsByCount(GetPointCount(list1), 2);

    Console.WriteLine("Doubles:");
    foreach (var point in doubles)
    {
        Console.WriteLine("X: {0}, Y: {1}", point.X, point.Y);
    }
}

private static List<CustomPoint> CreateCustomPointList()
{
    var result = new List<CustomPoint>();

    for (int i = 0; i < 5; i++)
    {
        for (int j = 0; j < 5; j++)
        {
            result.Add(new CustomPoint(i, j));
        }
    }

    result.Add(new CustomPoint(1, 3));
    result.Add(new CustomPoint(3, 3));
    result.Add(new CustomPoint(0, 2));

    return result;
}

CustomPoint执行:

public class CustomPoint
{
    public double X { get; set; }
    public double Y { get; set; }

    public CustomPoint(double x, double y)
    {
        this.X = x;
        this.Y = y;
    }

    public override bool Equals(object obj)
    {
        var other = obj as CustomPoint;

        if (other == null)
        {
            return base.Equals(obj);
        }

        return ((this.X == other.X) && (this.Y == other.Y));
    }

    public override int GetHashCode()
    {
        int hash = 23;
        hash = hash * 31 + this.X.GetHashCode();
        hash = hash * 31 + this.Y.GetHashCode();
        return hash;
    }
}

它打印:

Doubles:
X: 0, Y: 2
X: 1, Y: 3
X: 3, Y: 3

正如您在 中看到的,我为每个唯一(通过哈希)GetPointCount()创建一个字典。CustomPoint然后我插入一个PointCount对象,其中包含对从 1CustomPoint开始的a 的引用Count,每次遇到相同的点时,Count都会增加。

最后在GetPointsByCount我返回CustomPoint字典 where 中的 s PointCount.Count == count,在你的情况下是 2。

另请注意,我更新了该GetHashCode()方法,因为您的方法对点 (1,2) 和 (2,1) 返回相同。如果您确实需要,请随时恢复您自己的散列方法。不过,您将不得不测试散列函数,因为很难将两个数字唯一地散列为一个。这取决于使用的数字范围,因此您应该实现一个适合您自己需求的哈希函数。

于 2012-12-20T12:26:21.000 回答