6

我已经编写了这个函数来将性别从字符串数组中的不同值自动更正为 M 或 F。它工作正常,但我的经理告诉我使用他说效率更高的 Dictionary。但我不知道。有人愿意帮助我了解如何做到这一点吗?谢谢。

    Public Function AutoGender(ByVal dt As DataTable) As DataTable        

    Dim Gender As String = ""
    Dim Mkeywords() As String = {"boy", "boys", "male", "man", "m", "men", "guy"}
    Dim Fkeywords() As String = {"girl", "girls", "female", "woman", "f", "women", "chick"}
    Dim row As DataRow
        For Each row In dt.Rows
            If Mkeywords.Contains(row("Gender").ToString.ToLower) Then
                Gender = "M"
                row("Gender") = Gender
            ElseIf Fkeywords.Contains(row("Gender").ToString.ToLower) Then
                Gender = "F"
                row("Gender") = Gender
            End If
        Next
    Return dt

    End Function
4

2 回答 2

10

这是一个示例,您可以如何实现Dictionary(Of String, String)该同义词是否已知的查找:

Shared GenderSynonyms As Dictionary(Of String, String) = New Dictionary(Of String, String) From
    {{"boy", "M"}, {"boys", "M"}, {"male", "M"}, {"man", "M"}, {"m", "M"}, {"men", "M"}, {"guy", "M"},
     {"girl", "F"}, {"girls", "F"}, {"female", "F"}, {"woman", "F"}, {"f", "F"}, {"women", "F"}, {"chick", "F"}}

Public Function AutoGender(ByVal dt As DataTable) As DataTable
    If dt.Columns.Contains("Gender") Then
        For Each row As DataRow In dt.Rows
            Dim oldGender = row.Field(Of String)("Gender").ToLower
            Dim newGender As String = String.Empty
            If GenderSynonyms.TryGetValue(oldGender, newGender) Then
                row.SetField("Gender", newGender)
            End If
        Next
    End If
    Return dt
End Function

请注意,我使用集合初始化器来填充字典,这是使用文字来初始化集合的便捷方式。你也可以使用Add方法

编辑:另一种可能更简洁的方法是使用两个HashSet(Of String),一个用于男性同义词,一个用于女性:

Shared maleSynonyms As New HashSet(Of String) From
    {"boy", "boys", "male", "man", "m", "men", "guy"}
Shared femaleSynonyms As New HashSet(Of String) From
    {"girl", "girls", "female", "woman", "f", "women", "chick"}

Public Function AutoGender(ByVal dt As DataTable) As DataTable
    If dt.Columns.Contains("Gender") Then
        For Each row As DataRow In dt.Rows
            Dim oldGender = row.Field(Of String)("Gender").ToLower
            Dim newGender As String = String.Empty
            If maleSynonyms.Contains(oldGender) Then
                row.SetField("Gender", "M")
            ElseIf femaleSynonyms.Contains(oldGender) Then
                row.SetField("Gender", "F")
            End If
        Next
    End If
    Return dt
End Function

AHashSet也必须是唯一的,因此它不能包含重复项Strings(如 中的键Dictionary),但它不是键值对,而只是一个集合。

于 2012-06-21T10:05:25.403 回答
3

只需将两个数组都更改为字典,然后执行 aContainsKey而不是Contains.

Dim Mkeywords = New Dictionary(Of String, String) From
    {{"boy", ""}, {"boys", ""}, {"male", ""}, {"man", ""}, {"m", ""}, {"men", ""}, {"guy", ""}}

(并为女性效仿)

但是,您可能已经注意到,我输入了所有这些空字符串。这是因为字典有值和键,但由于我们没有使用值,所以我将它们设为空字符串。要进行相同的O(1)查找但避免所有无关的值,您可以HashSet以类似的方式使用 a。

就像我说的那样,您现在所要做的就是使用ContainsKey(或者HashSet如果您走那条路,它仍然只是Contains):

If Mkeywords.ContainsKey(row("Gender").ToString.ToLower) Then

最后一点:如果数据开始大幅增长,这只会“更有效”。现在你有它,只有这几个元素,使用字典甚至可能更慢。

于 2012-06-21T10:02:30.367 回答