0

我想以有效的方式从队列中删除重复的条目。队列有一个自定义类,包含 DateTime 和 FullPath 以及其他一些东西

private Queue<MyCustomClass> SharedQueue;

类中的 DateTime 是插入队列时的时间戳。我想使用的逻辑如下:如果 FullPath 在 4 秒窗口内相同(即如果在重复的完整路径的 4 秒内添加到队列中),则从队列中删除重复项。我有我想观看的事件,但仍然会有一些重复的事件到达,这没关系。

我正在使用 c# 2.0 和 FileSystemWatcher 类和一个工作队列。

有很多方法可以做到这一点:每次添加项目时修剪队列,或者当我在队列上工作时跳过当前重复项目的处理。

或者我应该使用“全局私有”变量 Dictionary< String, DateTime> 吗?这样我可以快速搜索吗?还是队列的本地副本?如果有很多文件事件,最好将本地队列限制为 100 个项目?虽然在我的情况下,它“应该”只是在文件夹中监控的相对较少的文件......但事情总是在变化......

谢谢你的帮助。

:编辑:美国东部时间 2 月 10 日 8:54:所以我决定实施一个好的简单解决方案,据我所知。我不认为我持有字典键太久......

:编辑:美国东部时间 2 月 10 日 9:53:更新为我的字典不能包含重复值。

   public void QueueInput(HotSynchUnit.RcdFSWFile rcd)
// start the worker thread when program starts.
// call Terminate.Set() in the programs exit routine or close handler etc.
{
  // lock shared queue
  lock (SharedQueue)
  {
    if (!IsDuplicateQueueInput(rcd))  // only add unique values to queue
    {
      SharedQueue.Enqueue(rcd);
      SomethingToDo.Set();
    }
  }
} // public void QueueInput

private bool IsDuplicateQueueInput(HotSynchUnit.RcdFSWFile rcd)
/* Return true if the object is a duplicate object.
 * Pseudo Code:
 * 
 * isDuplicate = false
 * Lock Dictionary
 * -If lastTimeStamp > 4 seconds ago then       // Optimization: save lastTimeStamp
 *    if Dict.Count > 0 then clear Dictionary
 *    return isDuplicate
 * -If not Dict.TryGetValue(sPath, dtTimeStamp) then
 *    Dict.AddKey()
 * -Else
 *    Compare key timestamp to Currenttime
 *    if key timestamp is <= 4 seconds ago then
 *       IsDuplicate = True
 *
 *    Dict.RemoveKey()
 *    Dict.AddKey()
 * 
 * return isDuplicate
*/
{
  // put real code here
}
4

4 回答 4

1

我只是考虑使用任何类似于通用哈希表的集合......像这样:

Dictionary<string, YourClass> dict = new Dictionary<string, YourClass>();

/// just let's assume you want to add/check for "c:\demo.txt"

if (!dict.ContainsKey(@"c:\demo.txt"))
{
   /// add items to dict by passing fullPath as key and your objects as value
   dict.add(@"c:\demo.txt", obj1);
} 
else if (dict[@"c:\demo.txt"].CheckForIntervall())
{
   /// replace current object in dictionary with new object - in case you want to..
   /// or just do what you want to 
}

编辑 - 您的自定义类可能具有如下功能:

class YOURCUSTOMCLASS
{
    private DateTime creationTime;

    public DateTime CreationTime
    { get { return creationTime; } }

    public YOURCUSTOMCLASS(parametersGoesHere xyz)
    {
          creationTime = DateTime.Now;
    }

    /// in this case this method will return true
    /// if the timeSpan between this object and otherObject
    /// is greater than 4 seconds
    public bool CheckForInterval(YOURCUSTOMCLASS otherObject)
    {
         TimeSpan diff = otherObj.CreationTime.Subtract(creationTime);

         /// you may replace 4 through any other digit, or even better take
         /// a const/global var/static ...
         return diff.TotalSeconds > 4;
    }

    /// all the other stuff you need ...
}

当然,您将失去队列的功能- 但如果您的队列包含许多元素,您将大大增加运行时间。

hth

于 2011-02-09T17:34:45.440 回答
0

I would make a wrapper class and not extend from queue, as users of the base type Queue expect different behavior. (Data Contracts in .NET 4.0 might even complain when you do so.)

Internally you can have a actual queue to which to redirect the required calls. Every Queue() call you could add the new element to a Dictionary when it is not contained already. Before doing so, you could empty all elements that are older than x seconds from this dictionary, and add them to the inner queue in order.

When dequeuing, you will have to check whether the inner queue contains elements, and otherwise pick the earliest element from the dictionary.

This ofcourse is just one possible implementation. When a lot of different elements might get queued quickly, the dictionary will fill up quickly and additional logic might have to be added to resolve that.

于 2011-02-09T17:28:19.767 回答
0

如果插入有重复的路径,为什么不直接拒绝?您所要做的就是从队列尾部开始线性搜索,并在您找到重复项(并拒绝插入)或时间戳超过您的时间限制(并插入记录)时停止?似乎比保留另一个数据结构和所有相关逻辑要简单得多。

于 2011-02-09T17:50:38.007 回答
0

我会做一个子类:

class MyDeduplicatedQueue : Queue<MyCustomObject> {
    /// etc
}

然后您可以将所有适当的过滤逻辑放入Enqueue方法中。

于 2011-02-09T16:50:36.623 回答