c# - 将任何字符串转换为有效的 DNS 子域

Question

我需要一个 c#/.net 中的方法，它可以将任何带有许多奇怪字符的字符串作为输入，并生成一个尽可能接近输入的有效子域。

示例：输入：Øyvind & René's Company Ltd. 输出：oyvindrenescompanyltd.example.com

有谁知道可以帮助我进行这种转换的.net 库？

删除子域中无效的所有字符很容易，但如果我必须替换很多字符（ø -> o，é -> e），那么捕获所有变体并非易事。

score 2 · Accepted Answer

但是如果我必须替换很多字符（ø -> o，é -> e），那么捕捉所有变化并不是一件容易的事。

实际上，通过利用 Unicode 规范化来删除变音符号（重音等）非常容易：

    public static string RemoveDiacritics(this string s)
    {
        if (s == null) throw new ArgumentNullException("s");
        string formD = s.Normalize(NormalizationForm.FormD);
        char[] chars = new char[formD.Length];
        int count = 0;
        foreach (char c in formD)
        {
            if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
            {
                chars[count++] = c;
            }
        }
        string noDiacriticsFormD = new string(chars, 0, count);
        return noDiacriticsFormD.Normalize(NormalizationForm.FormC);
    }

（请注意，它仅适用于完整的 .NET 框架，不适用于 Windows Phone、WinRT 或 Silverlight）

score 1 · Accepted Answer

您可以使用UnidecodePerl 模块的同名端口（或者您可以使用RemoveDiacriticsThomas Levesque 发布的方法）：

using BinaryAnalysis.UnidecodeSharp;
using System.Text.RegularExpressions;

public static string MakeSubdomain(string rawSubdomain, string baseDomain)
{
    if (baseDomain.Length + 2 > 253) {
        throw new ArgumentException("Base domain is already too long for a subdomain");
    }
    if (baseDomain.Length == 0) {
        throw new ArgumentException("Invalid base domain");
    }

    var sub = rawSubdomain.Unidecode();
    sub = Regex.Replace(sub, @"[^a-zA-Z0-9-]+", "");
    sub = Regex.Replace(sub, @"(^-+)|(-+$)", "");
    sub = sub.ToLowerInvariant();

    if (sub.Length > 63) {
        sub = sub.Substring(0, 63);
    }
    if (sub.Length + baseDomain.Length + 1 > 253) {
        sub = sub.Substring(0, 252 - baseDomain.Length);
    }
    return sub + "." + baseDomain;
}

c# - 将任何字符串转换为有效的 DNS 子域

2 回答 2

Related

Reference