c# - 压缩弱引用字典

Question

我有一个带有属性Id的类Foo。我的目标是没有两个Foo实例同时具有相同的Id。

所以我创建了一个工厂方法CreateFoo，它使用缓存来为相同的Id返回相同的实例。

static Foo CreateFoo(int id) {
    Foo foo;
    if (!cache.TryGetValue(id, out foo)) {
        foo = new Foo(id);
        foo.Initialize(...);
        cache.Put(id, foo);
    }
    return foo;
}

缓存实现为 Dictionary<TKey,WeakReference>，基于@JaredPar的Building a WeakReference Hashtable：

class WeakDictionary<TKey, TValue> where TValue : class {
    private readonly Dictionary<TKey, WeakReference> items;
    public WeakDictionary() {
        this.items = new Dictionary<TKey, WeakReference>();
    }
    public void Put(TKey key, TValue value) {
        this.items[key] = new WeakReference(value);
    }
    public bool TryGetValue(TKey key, out TValue value) {
        WeakReference weakRef;
        if (!this.items.TryGetValue(key, out weakRef)) {
            value = null;
            return false;
        } else {
            value = (TValue)weakRef.Target;
            return (value != null);
        }
    }
}

问题是 WeakReference 在其目标被垃圾收集后仍保留在字典中。这意味着需要一些策略来手动“垃圾收集”失效的 WeakReferences，正如@Pascal Cuoq在WeakReference.Target GC 后 WeakReference 会发生什么中所解释的那样。

我的问题是：压缩弱引用字典的最佳策略是什么？

我看到的选项是：

不要从 Dictionary 中删除 WeakReferences。IMO 这很糟糕，因为在我的应用程序的整个生命周期中都会使用缓存，并且随着时间的推移会积累很多死的 WeakReferences。
在每个Put和TryGetValue上遍历整个字典，并删除死的 WeakReferences。这在某种程度上违背了字典的目的，因为这两个操作都变成了O(n)。
在后台线程中定期遍历整个字典。鉴于我不知道CreateFoo的使用模式，什么是一个好的间隔？
将每个插入的 KeyValuePair 附加到一个双端链表。每次调用Put和TryGetValue都会检查列表的头部。如果 WeakReference 是活动的，则将该对移动到列表的末尾。如果它已死，则从列表中删除该对并从 Dictionary 中删除 WeakReference。
实现一个自定义哈希表，其细微差别是，当存储桶已满时，首先从存储桶中删除失效的 WeakReference，然后再照常进行。

还有其他策略吗？

最好的策略可能是具有摊销时间复杂度的算法。这样的策略存在吗？

score 11 · Accepted Answer

如果您可以将托管对象切换为字典的键，那么您可以使用 .Net 4.0 的 ConditionalWeakTable（命名空间 System.Runtime.CompilerServices）。

根据 Richter 先生的说法，ConditionalWeakTable 由垃圾收集器通知对象收集，而不是使用轮询线程。

    static ConditionalWeakTable<TabItem, TIDExec> tidByTab = new ConditionalWeakTable<TabItem, TIDExec>();

    void Window_Loaded(object sender, RoutedEventArgs e)
    {
        ...
        dataGrid.SelectionChanged += (_sender, _e) =>
        {
            var cs = dataGrid.SelectedItem as ClientSession;

            this.tabControl.Items.Clear();

            foreach (var tid in cs.GetThreadIDs())
            {
                tid.tabItem = new TabItem() { Header = ... };
                tid.tabItem.AddHandler(UIElement.MouseDownEvent,
                    new MouseButtonEventHandler((__sender, __e) =>
                    {
                        tabControl_SelectionChanged(tid.tabItem);
                    }), true);
                tidByTab.Add(tid.tabItem, tid);
                this.tabControl.Items.Add(tid.tabItem);
            }
        };
    }

    void tabControl_SelectionChanged(TabItem tabItem)
    {
        this.tabControl.SelectedItem = tabItem;
        if (tidByTab.TryGetValue(tabControl.SelectedItem as TabItem, out tidExec))
        {
            tidExec.EnsureBlocksLoaded();
            ShowStmt(tidExec.CurrentStmt);
        }
        else
            throw new Exception("huh?");
    }

这里重要的是，唯一引用 TabItem 对象的是 tabControls.Items 集合，以及 ConditionalWeakTable 的键。ConditionalWeakTable 的 key 不算。因此，当我们从 tabControl 中清除所有项目时，可以对这些 TabItems 进行垃圾收集（因为不再引用它们，因此 ConditionalWeakTable 的键也不计算在内）。当它们被垃圾收集时，会通知 ConditionalWeakTable 并删除具有该键值的条目。所以我庞大的 TIDExec 对象也被垃圾收集了（除了 ConditionalWeakTable 的值之外，没有任何东西引用它们）。

score 6 · Accepted Answer

您的选项 3（线程）有一个很大的缺点，即在所有 Put/TryGetvalue 操作上都需要同步。如果你使用它，你的时间间隔不是毫秒，而是每 N 个 TryGet 操作。

选项 2，扫描字典，会产生严重的开销。您可以通过仅扫描 1000 个动作中的 1 个和/或观察 GC 运行的频率来改进。

但我会认真考虑选项1：什么都不做。您可能有“很多”死条目，但另一方面它们非常小（并且可以回收）。可能不是服务器应用程序的选项，但对于客户端应用程序，我会尝试衡量我们正在谈论的每小时有多少条目（kByte）。

经过一番讨论：

这样的 [n amortized] 策略是否存在？

我猜没有。您的问题是 GC 的微型版本。您将不得不不时扫描整个事物。所以只有选项 2) 和 3) 提供了真正的解决方案。而且它们都很昂贵，但可以通过一些启发式（大量）优化。选项 2) 仍然会给你偶尔的最坏情况。

score 4 · Accepted Answer

方法#5 很有趣，但它的缺点是很难知道哈希表的实际利用率是多少，因此很难知道何时应该扩展哈希表。如果每当哈希表“看起来”应该扩展时，首先进行全表扫描以删除死条目，则可以克服该困难。如果表中超过一半的条目已失效，请不要费心扩展它。这种方法应该产生摊销的 O(1) 行为，因为在添加与删除的条目一样多的条目之前，不会进行全表扫描。

一个更简单的方法，也将产生 O(1) 摊销时间和 O(1) 空间每个最近活动的元素将保持计数在最后一次清除表后有多少项目是活动的，以及有多少元素从那时起已添加。每当后一个计数超过第一个时，执行一次全表扫描和清除。扫描和清除所需的时间将与清除之间添加的元素数量成正比，因此保留了分摊的 O(1) 时间，并且集合中的总元素数量不会超过最近观察到的元素数量的两倍是活着的，所以死元素的数量不能超过最近活跃的元素数量的两倍。

score 3 · Accepted Answer

我遇到了同样的问题，并像这样解决了它（WeakDictionary 是我试图清理的类）：

internal class CleanerRef
{
    ~CleanerRef()
    {
        if (handle.IsAllocated)
            handle.Free();
    }

    public CleanerRef(WeakDictionaryCleaner cleaner, WeakDictionary dictionary)
    {
        handle = GCHandle.Alloc(cleaner, GCHandleType.WeakTrackResurrection);
        Dictionary = dictionary;
    }

    public bool IsAlive
    {
        get {return handle.IsAllocated && handle.Target != null;}
    }

    public object Target
    {
        get {return IsAlive ? handle.Target : null;}
    }

    GCHandle handle;
    public WeakDictionary Dictionary;
}


internal class WeakDictionaryCleaner
{
    public WeakDictionaryCleaner(WeakDictionary dict)
    {
        refs.Add(new CleanerRef(this, dict));
    }

    ~WeakDictionaryCleaner()
    {
        foreach(var cleanerRef in refs)
        {
            if (cleanerRef.Target == this)
            {
                cleanerRef.Dictionary.ClearGcedEntries();
                refs.Remove(cleanerRef);
                break;
            }
        }
    }
    private static readonly List<CleanerRef> refs = new List<CleanerRef>();
}

这两个类试图实现的是“挂钩”GC。您可以通过在构造弱集合期间创建 WeakDictionaryCleaner 的实例来激活此机制：

new WeakDictionaryCleaner(weakDictionary);

请注意，我没有创建对新实例的任何引用，因此 GC 将在下一个周期中处理它。在 ClearGcedEntries() 方法中，我再次创建了一个新实例，这样每个 GC 循环都会有一个清理器来完成，进而执行集合压缩。您可以使 CleanerRef.Dictionary 也成为弱引用，这样它就不会将字典保存在内存中。

希望这可以帮助

score 2 · Accepted Answer

我想这是放置它的正确位置，即使它看起来像死灵法术。以防有人像我一样偶然发现这个问题。.net 中缺少专用的身份映射有点令人惊讶，我觉得最自然的工作方式如最后一个选项中所述：当表已满并且容量即将翻倍时，它会检查是否存在足够的死条目可以回收以供进一步使用，因此不需要增长。

static IdentityMap<int, Entity> Cache = new IdentityMap<int, Entity>(e => e.ID);
...
var entity = Cache.Get(id, () => LoadEntity(id));

该类仅公开一个Get带有key可选value参数的公共方法，如果实体不在缓存中，则该参数会延迟加载和缓存。

using System;
class IdentityMap<TKey, TValue>
    where TKey : IEquatable<TKey>
    where TValue : class
{
    Func<TValue, TKey> key_selector;
    WeakReference<TValue>[] references;
    int[] buckets;
    int[] bucket_indexes;
    int tail_index;
    int entries_count;
    int capacity;

    public IdentityMap(Func<TValue, TKey> key_selector, int capacity = 10) {
        this.key_selector = key_selector;
        Init(capacity);
    }
    void Init(int capacity) {
        this.bucket_indexes = new int[capacity];
        this.buckets = new int[capacity];
        this.references = new WeakReference<TValue>[capacity];
        for (int i = 0; i < capacity; i++) {
            bucket_indexes[i] = -1;
            buckets[i] = i - 1;
        }
        this.tail_index = capacity - 1;
        this.entries_count = 0;
        this.capacity = capacity;
    }

    public TValue Get(TKey key, Func<TValue> value = null) {
        int bucket_index = Math.Abs(key.GetHashCode() % this.capacity);
        var ret = WalkBucket(bucket_index, true, key);
        if (ret == null && value != null) Add(bucket_index, ret = value());
        return ret;
    }

    void Add(int bucket_index, TValue value) {
        if (this.entries_count == this.capacity) {
            for (int i = 0; i < capacity; i++) WalkBucket(i, false, default(TKey));
            if (this.entries_count * 2 > this.capacity) {
                var old_references = references;
                Init(this.capacity * 2);
                foreach (var old_reference in old_references) {
                    TValue old_value;
                    if (old_reference.TryGetTarget(out old_value)) {
                        int hash = key_selector(value).GetHashCode();
                        Add(Math.Abs(hash % this.capacity), old_value);
                    }
                }
            }
        }
        int new_index = this.tail_index;
        this.tail_index = buckets[this.tail_index];
        this.entries_count += 1;
        buckets[new_index] = bucket_indexes[bucket_index];
        if (references[new_index] != null) references[new_index].SetTarget(value);
        else references[new_index] = new WeakReference<TValue>(value);
        bucket_indexes[bucket_index] = new_index;
    }

    TValue WalkBucket(int bucket_index, bool is_searching, TKey key) {
        int curr_index = bucket_indexes[bucket_index];
        int prev_index = -1;
        while (curr_index != -1) {
            TValue value;
            int next_index = buckets[curr_index];
            if (references[curr_index].TryGetTarget(out value)) {
                if (is_searching && key_selector(value).Equals(key)) return value;
                prev_index = curr_index;
            } else {
                if (prev_index != -1) buckets[prev_index] = next_index;
                else bucket_indexes[bucket_index] = next_index;

                buckets[curr_index] = this.tail_index;
                this.tail_index = curr_index;
                this.entries_count -= 1;
            }
            curr_index = next_index;
        }
        return null;
    }
}

score 1 · Accepted Answer

WeakReference您可以删除里面的“无效” TryGetValue：

[编辑]我的错误，这些解决方案实际上只是你建议的，因为Put无论如何方法都会用新对象交换旧对象。忽略它。

public bool TryGetValue(TKey key, out TValue value) {
    WeakReference weakRef;
    if (!this.items.TryGetValue(key, out weakRef)) {
        value = null;
        return false;
    } else {
        value = (TValue)weakRef.Target;
        if (value == null)
            this.items.Remove(key);
        return (value != null);
    }
}

或者，您可以在需要时立即在字典中创建一个新实例：

public TValue GetOrCreate(TKey key, Func<Tkey, TValue> ctor) {

    WeakReference weakRef;
    if (!this.items.TryGetValue(key, out weakRef) {
        Tvalue result = ctor(key);
        this.Put(key, result);
        return result;
    } 

    value = (TValue)weakRef.Target;
    if (value == null)
    {
        Tvalue result = ctor(key);
        this.Put(key, result);
        return result;
    }

    return value;
}

然后你会像这样使用它：

static Foo CreateFoo(int id)
{
    return cache.GetOrCreate(id, id => new Foo(id));
}

[编辑]

根据windbg，WeakReference单独实例占用16个字节。对于十万件收集的物品来说，这不会是一个如此沉重的负担，所以你可以轻松地让他们活下去。

如果这是一个服务器应用程序，并且您认为您可以从收集中受益，我会考虑使用后台线程，但也可以实现一个简单的算法，以在您收集相对较少数量的对象时增加等待时间。

score -1 · Accepted Answer

一点专业化：当目标类知道弱字典引用及其TKey值时，您可以从 finalyzer 调用中删除它的条目。

public class Entry<TKey>
{
    TKey key;
    Dictionary<TKey, WeakReference> weakDictionary;

    public Entry(Dictionary<TKey, WeakReference> weakDictionary, TKey key)
    {
        this.key = key;
        this.weakDictionary = weakDictionary;
    }

    ~Entry()
    {
        weakDictionary.Remove(key);
    }
}

当缓存对象是的子类时Entry<TKey>，不会出现空WeakReference泄漏，因为 finalyzer 在其实例被垃圾回收后被调用。

c# - 压缩弱引用字典

7 回答 7

经过一番讨论：

Related

Reference