c# - 正则表达式从名称中提取首字母

Question

eg. if the Name is: John Deer
the Initials should be: JD

我可以使用子字符串对 Initials 字段执行此检查，但想知道是否可以为其编写正则表达式？编写正则表达式是否比使用字符串方法更好？

score 23 · Accepted Answer

这是我的解决方案。我的目标不是提供最简单的解决方案，而是提供一种可以采用各种（有时很奇怪）名称格式的解决方案，并在名字和姓氏首字母（或在单名的情况下）单个首字母的情况下生成最佳猜测。

我还尝试以相对国际友好的方式编写它，使用 unicode 正则表达式，尽管我没有任何为多种外国名称（例如中文）生成首字母的经验，尽管它至少应该生成一些可用的东西代表人，在两个字符之下。例如，用韩语给它取一个名字，如“행운의 복숭아”，会产生 행복，正如你所预料的那样（尽管这在韩国文化中可能不是正确的做法）。

/// <summary>
/// Given a person's first and last name, we'll make our best guess to extract up to two initials, hopefully
/// representing their first and last name, skipping any middle initials, Jr/Sr/III suffixes, etc. The letters 
/// will be returned together in ALL CAPS, e.g. "TW". 
/// 
/// The way it parses names for many common styles:
/// 
/// Mason Zhwiti                -> MZ
/// mason lowercase zhwiti      -> MZ
/// Mason G Zhwiti              -> MZ
/// Mason G. Zhwiti             -> MZ
/// John Queue Public           -> JP
/// John Q. Public, Jr.         -> JP
/// John Q Public Jr.           -> JP
/// Thurston Howell III         -> TH
/// Thurston Howell, III        -> TH
/// Malcolm X                   -> MX
/// A Ron                       -> AR
/// A A Ron                     -> AR
/// Madonna                     -> M
/// Chris O'Donnell             -> CO
/// Malcolm McDowell            -> MM
/// Robert "Rocky" Balboa, Sr.  -> RB
/// 1Bobby 2Tables              -> BT
/// Éric Ígor                   -> ÉÍ
/// 행운의 복숭아                 -> 행복
/// 
/// </summary>
/// <param name="name">The full name of a person.</param>
/// <returns>One to two uppercase initials, without punctuation.</returns>
public static string ExtractInitialsFromName(string name)
{
    // first remove all: punctuation, separator chars, control chars, and numbers (unicode style regexes)
    string initials = Regex.Replace(name, @"[\p{P}\p{S}\p{C}\p{N}]+", "");

    // Replacing all possible whitespace/separator characters (unicode style), with a single, regular ascii space.
    initials = Regex.Replace(initials, @"\p{Z}+", " ");

    // Remove all Sr, Jr, I, II, III, IV, V, VI, VII, VIII, IX at the end of names
    initials = Regex.Replace(initials.Trim(), @"\s+(?:[JS]R|I{1,3}|I[VX]|VI{0,3})$", "", RegexOptions.IgnoreCase);

    // Extract up to 2 initials from the remaining cleaned name.
    initials = Regex.Replace(initials, @"^(\p{L})[^\s]*(?:\s+(?:\p{L}+\s+(?=\p{L}))?(?:(\p{L})\p{L}*)?)?$", "$1$2").Trim();

    if (initials.Length > 2)
    {
        // Worst case scenario, everything failed, just grab the first two letters of what we have left.
        initials = initials.Substring(0, 2);
    }

    return initials.ToUpperInvariant();
}

score 22 · Accepted Answer

就个人而言，我更喜欢这个正则表达式

Regex initials = new Regex(@"(\b[a-zA-Z])[a-zA-Z]* ?");
string init = initials.Replace(nameString, "$1");
//Init = "JD"

这会处理首字母缩写和空格删除（那是“？”在最后）。

你唯一需要担心的是像 Jr. 或 Sr. 或 Mrs.... 等头衔和名称。有些人确实在他们的全名中包含这些

score 9 · Accepted Answer

这是我的方法：

public static string GetInitials(string names) {
    // Extract the first character out of each block of non-whitespace
    // exept name suffixes, e.g. Jr., III. The number of initials is not limited.
    return Regex.Replace(names, @"(?i)(?:^|\s|-)+([^\s-])[^\s-]*(?:(?:\s+)(?:the\s+)?(?:jr|sr|II|2nd|III|3rd|IV|4th)\.?$)?", "$1").ToUpper();
}

经办案例：

// Mason Zhwiti                               -> MZ
// mason zhwiti                               -> MZ
// Mason G Zhwiti                             -> MGZ
// Mason G. Zhwiti                            -> MGZ
// John Queue Public                          -> JQP
// John-Queue Public                          -> JQP
// John Q. Public, Jr.                        -> JQP
// John Q Public Jr.                          -> JQP
// John Q Public Jr                           -> JQP
// John Q Public Jraroslav                    -> JQPJ
// Thurston Howell III                        -> TH
// Thurston Howell, III                       -> TH
// Thurston Howell the III                    -> TH
// Malcolm X                                  -> MX
// A Ron                                      -> AR
// A A Ron                                    -> AAR
// Madonna                                    -> M
// Chris O'Donnell                            -> CO
// Chris O' Donnell                           -> COD
// Malcolm McDowell                           -> MM
// Éric Ígor                                  -> ÉÍ
// 행운의 복숭아                               -> 행복

未处理案件：

// James Henry George Michael III the second  -> JHGMIts
// Robert "Rocky" Balboa, Sr.                 -> R"B
// 1Bobby 2Tables                             -> 12 (is it a real name?)

score 2 · Accepted Answer

2

这个怎么样？

var initials = Regex.Replace( "John Deer", "[^A-Z]", "" );

于 2012-05-30T16:25:15.473 回答

score 2 · Accepted Answer

这是一个强调保持简单的替代方案：

    /// <summary>
    /// Get initials from the supplied names string.
    /// </summary>
    /// <param name="names">Names separated by whitespace</param>
    /// <param name="separator">Separator between initials (e.g "", "." or ". ")</param>
    /// <returns>Upper case initials (with separators in between)</returns>
    public static string GetInitials(string names, string separator)
    {
        // Extract the first character out of each block of non-whitespace
        Regex extractInitials = new Regex(@"\s*([^\s])[^\s]*\s*");
        return extractInitials.Replace(names, "$1" + separator).ToUpper();
    }

如果提供的名称与预期不符，则存在一个问题。我个人认为它应该只返回每个不是空格的文本块中的第一个字符。例如：

1Steve 2Chambers               => 12
harold mcDonald                => HM
David O'Leary                  => DO
David O' Leary                 => DOL
Ronnie "the rocket" O'Sullivan => R"RO

有些人会主张使用更复杂/更复杂的技术（例如更好地处理最后一个），但 IMO 这确实是一个数据清理问题。

score 0 · Accepted Answer

试试这个

(^| )([^ ])([^ ])*','\2')

或者这个

 public static string ToInitials(this string str)
    {
      return Regex.Replace(str, @"^(?'b'\w)\w*,\s*(?'a'\w)\w*$|^(?'a'\w)\w*\s*(?'b'\w)\w*$", "${a}${b}", RegexOptions.Singleline)
    }

http://www.kewney.com/posts/software-development/using-regular-expressions-to-get-initials-from-a-string-in-c-sharp

score 0 · Accepted Answer

[a-z]+[a-z]+\b这将使您净每个名字的前两个字母...

其中 name = 'Greg Henry' = 'G H' 或 'James Smith' 'JS'

然后你可以在 ' ' 上拆分并在 '' 上加入

这甚至适用于像这样的名字

“詹姆斯亨利乔治迈克尔”=“JHG M”

'詹姆斯亨利乔治迈克尔三世'='JHGM III'

如果你想避免拆分利用[a-z]+[a-z]+\b ?

但是像Jon Michael Jr. The 3rdwill 之类JMJr.T3的名字 = 上面的选项可以让你得到 'The'、'the' 和 '3rd' 如果你想要..

如果您真的想花哨，可以使用(\b[a-zA-Z])[a-zA-Z]* ?来匹配名称的部分，然后替换为前者。

score 0 · Accepted Answer

这个怎么样：

        string name = "John Clark MacDonald";
        var parts = name.Split(' ');
        string initials = "";

        foreach (var part in parts)
        {
            initials += Regex.Match(part, "[A-Z]");
            Console.WriteLine(part + " --> " + Regex.Match(part,"[A-Z]"));
        }
        Console.WriteLine("Final initials: " + initials);
        Console.ReadKey();

这允许可选的中间名，并且适用于多种大写，如上所示。

score 0 · Accepted Answer

我的解决方案如下（C# 正则表达式方言）

^\s*(?>(?<First>\w)\w*).*?((?<Last>\w)\w*)?\s*$

将分别匹配命名组First和Last第一个单词的第一个字母和最后一个单词的第一个后面的字母，很高兴忽略可能介于两者之间的所有单词，并且不关心是否有尾随或前导空格

不需要替换，匹配发生在一行中，您可以提取按名称访问匹配组的字母，如下所示

var displayName = "Nick 'Goose' Bradshaw";
var initialsRule = new Regex(@"^\s*(?>(?<First>\w)\w*).*?((?<Last>\w)\w*)?\s*$");
var matches = initialsRule.Match(displayName);
var initials = $"{matches.Groups["First"].Value}{matches.Groups["Last"].Value}";
//initials: "NB"

score 0 · Accepted Answer

kotlin 中最简单的版本

val initials: String = if (str.size > 1) str[0][0].toString() + str[1][0].toString() else str[0][0].toString()

score -1 · Accepted Answer

是的，使用正则表达式。您可以使用 Regex.Match 和 Regex.Match.Groups 方法来查找匹配项，然后提取您需要的匹配值 - 在这种情况下是首字母。查找和提取值将同时发生。

c# - 正则表达式从名称中提取首字母

11 回答 11

Related

Reference