c# - 使用 ContainsValue() 时字典运行缓慢

Question

我有一个HashSet包含通过读取二进制文件生成的自定义对象。我还有一个通过读取 DBF 文件的每一行生成的字典。两者都有一个 index 属性，它们彼此对齐。例如，我的 Dictionary 中的第 10 项将与我的HashSet.

我正在将大量数据相互比较。可以有 10,000 到 500,000 条记录。应用程序检查其他两个文件（一个是二进制文件，另一个是 dbf）是否存在差异。它检查HashCode对象的（由某些属性生成，它可以快速轻松地进行比较）

以下是我如何构建每个单独的字典（mod 也有类似的字典）：

foreach (DataRow row in origDbfFile.datatable.Rows)
{
    string str = "";
    foreach (String columnName in columnNames)
    {
        str += "~" + row.Field<Object>(columnName);
    }
    origDRdict.Add(d, str);
    d++;
}

两个文件之间的列将始终相同。但是，我可以遇到具有不同列的两个不同文件。我基本上将所有数据输出到一个字符串中以进行字典查找。如果数据不同，我只想再次点击 DBF 文件。

这是我的数据库查找代码。这会发现差异，当它运行我的(!foundIt) if块的 ELSE 部分时真的很慢。如果我删除它，只需一分钟即可列出所有未找到的项目。

foreach (CustomClass customclass in origCustomClassList) {
    Boolean foundIt = false;
    if (modCustomClassList.Contains(customclass))
    {
        foundIt = true;
    }
    //at this point, an element has not been found
    if (!foundIt)
    {
        notFoundRecords.Add(customclass);

    } 
    //If I remove this entire else block, code runs fast.
    else //at this point an element has been found
    {
        //
        //check 'modified' dictionary array
        if (!(modDRdict.ContainsValue(origDRdict[i])))
        {
            //at this point, the coordinates are the same, 
            //however there are DB changes
            //this is where I would do a full check based on indexes 
            //to show changes. 
        }
    }

    i++; //since hashsets can't be indexed, we need to increment
}

我尝试过的/其他想法

生成HashSet自定义对象，具有整数索引的自定义对象，字符串是列和值的长度
删除if (!(modDRdict.ContainsValue(origDRdict[i])))块使代码显着更快。在两个 440,000 个记录文件之间迭代删除的记录只需要一分钟。字典查找需要永远！
我不认为foreach循环内的foreach循环会导致过多的开销。如果我将它保存在代码中，但不进行查找，那么它仍然运行得很快。

score 3 · Accepted Answer

字典经过优化，可以按键查找，而不是按值查找。如果您需要按值查找，则说明您使用了错误的字典。您需要在您的值上构建一个 HashSet 以快速检查是否包含，或者如果您需要密钥，则构建一个反向字典。

c# - 使用 ContainsValue() 时字典运行缓慢

1 回答 1

Related

Reference