这个问题类似于How to emulate MySQLs utf8_general_ci collation in PHP string comparisons但我想要 vb.net 而不是 PhP 的函数。
最近我做了很多据说是独一无二的钥匙。
一些键在 UTF8 unicode 排序规则下是等效的。
例如,看看这 2 个键:
拜尔斯街小酒馆__38.15_-79.07 拜尔斯街小酒馆__38.15_-79.07
如果我将其粘贴到首页,并查看源代码,您会看到
拜尔斯街小酒馆__38.15_-79.07
拜尔斯街小酒馆__38.15_-79.07
注意:在堆栈溢出中,它们看起来仍然不同。
我知道这不一样。我想即使在堆栈交换中它也不会显示。假设我有 100 万条这样的记录,我想测试 MySQL UTF8 排序规则是否将 2 个字符串声明为相同。我想在上传之前知道这一点。我怎么做。
所以 vb.net 认为这些是不同的键。当我们创建 mysql 查询并将其上传到数据库时,数据库抱怨它是同一个键。只需一个投诉,100 万个数据库的上传就会卡住。
我们甚至不知道到底是什么?无论如何,我们在哪里可以查到呢?
无论如何,我想要一个函数,当给定 2 个字符串时,它会告诉我它们是否会被视为相同。
如果可能的话,我们想要一个将字符串转换为最“标准”形式的函数。
例如, 似乎什么都不编码,该函数会重新识别所有这些无字符并消除它。
有这种事吗?
到目前为止,这就是我所做的。我需要更全面的东西。
Private Function StraightenQuotesReplacement() As Generic.Dictionary(Of String, String)
Static replacement As Generic.Dictionary(Of String, String)
If replacement Is Nothing Then
replacement = New Generic.Dictionary(Of String, String)
replacement.Add(ChrW(&H201C), """")
replacement.Add(ChrW(&H201D), """")
replacement.Add(ChrW(&H2018), "'")
replacement.Add(ChrW(&H2019), "'")
End If
Return replacement
End Function
<Extension()>
Public Function straightenQuotes(ByVal somestring As String) As String
For Each key In StraightenQuotesReplacement.Keys
somestring = somestring.Replace(key, StraightenQuotesReplacement.Item(key))
Next
Return somestring
End Function
<Extension()>
Public Function germanCharacter(ByVal s As String) As String
Dim t = s
t = t.Replace("ä", "ae")
t = t.Replace("ö", "oe")
t = t.Replace("ü", "ue")
t = t.Replace("Ä", "Ae")
t = t.Replace("Ö", "Oe")
t = t.Replace("Ü", "Ue")
t = t.Replace("ß", "ss")
Return t
End Function
<Extension()>
Public Function japaneseCharacter(ByVal s As String) As String
Dim t = s
t = t.Replace("ヶ", "ケ")
Return t
End Function
<Extension()>
Public Function greekCharacter(ByVal s As String) As String
Dim t = s
t = t.Replace("ς", "σ")
t = t.Replace("ι", "ί")
Return t
End Function
<Extension()>
Public Function franceCharacter(ByVal s As String) As String
Dim t = s
t = t.Replace("œ", "oe")
Return t
End Function
<Extension()>
Public Function RemoveDiacritics(ByVal s As String) As String
Dim normalizedString As String
Dim stringBuilder As New StringBuilder
normalizedString = s.Normalize(NormalizationForm.FormD)
Dim i As Integer
Dim c As Char
For i = 0 To normalizedString.Length - 1
c = normalizedString(i)
If CharUnicodeInfo.GetUnicodeCategory(c) <> UnicodeCategory.NonSpacingMark Then
stringBuilder.Append(c)
End If
Next
Return stringBuilder.ToString()
End Function
<Extension()>
Public Function badcharacters(ByVal s As String) As String
Dim t = s
t = t.Replace(ChrW(8206), "")
Return t
End Function
<Extension()>
Public Function sanitizeUTF8_Unicode(ByVal str As String) As String
Return str.ToLower.removeDoubleSpaces.SpacetoDash.EncodeUrlLimited.straightenQuotes.RemoveDiacritics.greekCharacter.germanCharacter
End Function