php - 标题大小写包含一个或多个姓氏的字符串，同时处理带有撇号的名称

Question

我想标准化用户提供的字符串。我希望名称的第一个字母大写，如果他们输入了两个姓氏，则将名字和第二个名字大写。例如，如果有人输入：

marriedname maidenname

如果有两个以上的名称，它会将其转换为Marriedname Maidenname等等。

另一种情况是某人的名字中有撇号。如果有人进入：

o'connell

这需要转换为O'Connell.

我正在使用：

ucfirst(strtolower($last_name));

但是，正如您所知，这不适用于所有场景。

score 45 · Accepted Answer

这将大写所有单词的第一个字母，以及撇号后的字母。它将使所有其他字母小写。它应该适合你：

str_replace('\' ', '\'', ucwords(str_replace('\'', '\' ', strtolower($last_name))));

score 29 · Accepted Answer

你可以试试这个

<?php echo ucwords(strtolower('Dhaka, JAMALPUR, sarishabari')) ?>

结果是：Dhaka, Jamalpur, Sarishabari

score 24 · Accepted Answer

这些都不是 UTF8 友好的，所以这是一个完美无缺的（到目前为止）

function titleCase($string, $delimiters = array(" ", "-", ".", "'", "O'", "Mc"), $exceptions = array("and", "to", "of", "das", "dos", "I", "II", "III", "IV", "V", "VI"))
{
    /*
     * Exceptions in lower case are words you don't want converted
     * Exceptions all in upper case are any words you don't want converted to title case
     *   but should be converted to upper case, e.g.:
     *   king henry viii or king henry Viii should be King Henry VIII
     */
    $string = mb_convert_case($string, MB_CASE_TITLE, "UTF-8");
    foreach ($delimiters as $dlnr => $delimiter) {
        $words = explode($delimiter, $string);
        $newwords = array();
        foreach ($words as $wordnr => $word) {
            if (in_array(mb_strtoupper($word, "UTF-8"), $exceptions)) {
                // check exceptions list for any words that should be in upper case
                $word = mb_strtoupper($word, "UTF-8");
            } elseif (in_array(mb_strtolower($word, "UTF-8"), $exceptions)) {
                // check exceptions list for any words that should be in upper case
                $word = mb_strtolower($word, "UTF-8");
            } elseif (!in_array($word, $exceptions)) {
                // convert to uppercase (non-utf8 only)
                $word = ucfirst($word);
            }
            array_push($newwords, $word);
        }
        $string = join($delimiter, $newwords);
   }//foreach
   return $string;
}

用法：

$s = 'SÃO JOÃO DOS SANTOS';
$v = titleCase($s); // 'São João dos Santos'

score 4 · Accepted Answer

4

使用这个内置函数：

ucwords('string');

于 2013-03-14T10:33:40.640 回答

score 2 · Accepted Answer

我不相信会有一个涵盖所有场景的好答案。PHP.net 论坛ucwords有相当多的讨论，但似乎没有一个能解决所有问题。我建议您使用大写字母或单独保留用户的输入进行标准化。

score 1 · Accepted Answer

这是对主要问题的更简单和更直接的答案。下面的函数模仿了 PHP 方法。以防万一 PHP 将来使用它们的命名空间扩展它，首先检查一个测试。我正在为我的 wordpress 安装中的任何语言使用这种防水。

$str = mb_ucfirst($str, 'UTF-8', true);

这使第一个字母大写，所有其他小写字母与 Q 一样。如果第三个 arg 设置为 false（默认值），则不会操纵字符串的其余部分。

// Extends PHP
if (!function_exists('mb_ucfirst')) {

function mb_ucfirst($str, $encoding = "UTF-8", $lower_str_end = false) {
    $first_letter = mb_strtoupper(mb_substr($str, 0, 1, $encoding), $encoding);
    $str_end = "";
    if ($lower_str_end) {
        $str_end = mb_strtolower(mb_substr($str, 1, mb_strlen($str, $encoding), $encoding), $encoding);
    } else {
        $str_end = mb_substr($str, 1, mb_strlen($str, $encoding), $encoding);
    }
    $str = $first_letter . $str_end;
    return $str;
}

}

score 1 · Accepted Answer

您可以preg_replace与e标志一起使用（执行 php 函数）：

function processReplacement($one, $two)
{
  return $one . strtoupper($two);
}

$name = "bob o'conner";
$name = preg_replace("/(^|[^a-zA-Z])([a-z])/e","processReplacement('$1', '$2')", $name);

var_dump($name); // output "Bob O'Conner"

也许正则表达式模式可以改进，但我所做的是：

$1是行首或任何非字母字符。
$2是任何小写字母字符

processReplacement()然后我们用简单函数的结果替换这两个。

如果您有 PHP 5.3，那么创建processReplacement()一个匿名函数可能是值得的。

score 1 · Accepted Answer

这是我对 PHP 中拉丁名称大写的高度过度设计但非常包罗万象的解决方案。它将解决您所有的大小写问题。他们全部。

/**
 * Over-engineered solution to most capitalisation issues.
 * 
 * @author https://stackoverflow.com/users/429071/dearsina
 * @version 1.0
 */ 
class str {
    /**
     * Words or abbreviations that should always be all uppercase
     */
    const ALL_UPPERCASE = [
        "UK",
        "VAT",
    ];

    /**
     * Words or abbreviations that should always be all lowercase
     */
    const ALL_LOWERCASE = [
        "and",
        "as",
        "by",
        "in",
        "of",
        "or",
        "to",
    ];

    /**
     * Honorifics that only contain consonants.
     *
     */
    const CONSONANT_ONLY_HONORIFICS = [
        # English
        "Mr",
        "Mrs",
        "Ms",
        "Dr",
        "Br",
        "Sr",
        "Fr",
        "Pr",
        "St",

        # Afrikaans
        "Mnr",
    ];

    /**
     * Surname prefixes that should be lowercase,
     * unless not following another word (firstname).
     */
    const SURNAME_PREFIXES = [
        "de la",
        "de las",
        "van de",
        "van der",
        "vit de",
        "von",
        "van",
        "del",
        "der",
    ];

    /**
     * Capitalises every (appropriate) word in a given string.
     *
     * @param string|null $string
     *
     * @return string|null
     */
    public static function capitalise(?string $string): ?string
    {
        if(!$string){
            return $string;
        }

        # Strip away multi-spaces
        $string = preg_replace("/\s{2,}/", " ", $string);

        # Ensure there is always a space after a comma
        $string = preg_replace("/,([^\s])/", ", $1", $string);

        # A word is anything separated by spaces or a dash
        $string = preg_replace_callback("/([^\s\-\.]+)/", function($matches){
            # Make the word lowercase
            $word = mb_strtolower($matches[1]);

            # If the word needs to be all lowercase
            if(in_array($word, self::ALL_LOWERCASE)){
                return strtolower($word);
            }

            # If the word needs to be all uppercase
            if(in_array(mb_strtoupper($word), self::ALL_UPPERCASE)){
                return strtoupper($word);
            }

            # Create a version without diacritics
            $transliterator = \Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', \Transliterator::FORWARD);
            $ascii_word = $transliterator->transliterate($word);


            # If the word contains non-alpha characters (numbers, &, etc), with exceptions (comma, '), assume it's an abbreviation
            if(preg_match("/[^a-z,']/i", $ascii_word)){
                return strtoupper($word);
            }

            # If the word doesn't contain any vowels, assume it's an abbreviation
            if(!preg_match("/[aeiouy]/i", $ascii_word)){
                # Unless the word is an honorific
                if(!in_array(ucfirst($word), self::CONSONANT_ONLY_HONORIFICS)){
                    return strtoupper($word);
                }
            }

            # If the word contains two of the same vowel and is 3 characters or fewer, assume it's an abbreviation
            if(strlen($word) <= 3 && preg_match("/([aeiouy])\1/", $word)){
                return strtoupper($word);
            }

            # Ensure O'Connor, L'Oreal, etc, are double capitalised, with exceptions (d')
            if(preg_match("/\b([a-z]')(\w+)\b/i", $word, $match)){
                # Some prefixes (like d') are not capitalised
                if(in_array($match[1], ["d'"])){
                    return $match[1] . ucfirst($match[2]);
                }

                # Otherwise, everything is capitalised
                return strtoupper($match[1]) . ucfirst($match[2]);
            }

            # Otherwise, return the word with the first letter (only) capitalised
            return ucfirst($word);
            //The most common outcome
        }, $string);

        # Cater for the Mc prefix
        $pattern = "/(Mc)([b-df-hj-np-tv-z])/";
        //Mc followed by a consonant
        $string = preg_replace_callback($pattern, function($matches){
            return "Mc" . ucfirst($matches[2]);
        }, $string);

        # Cater for Roman numerals (need to be in all caps)
        $pattern = "/\b((?<![MDCLXVI])(?=[MDCLXVI])M{0,3}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3}))\b/i";
        $string = preg_replace_callback($pattern, function($matches){
            return strtoupper($matches[1]);
        }, $string);

        # Cater for surname prefixes (must be after the Roman numerals)
        $pattern = "/\b (".implode("|", self::SURNAME_PREFIXES).") \b/i";
        //A surname prefix, bookended by words
        $string = preg_replace_callback($pattern, function($matches){
            return strtolower(" {$matches[1]} ");
        }, $string);

        # Cater for ordinal numbers
        $pattern = "/\b(\d+(?:st|nd|rd|th))\b/i";
        //A number suffixed with an ordinal
        $string = preg_replace_callback($pattern, function($matches){
            return strtolower($matches[1]);
        }, $string);

        # And we're done done
        return $string;
    }
}

玩一玩。

score 0 · Accepted Answer

首先转换为标题大小写，然后找到第一个撇号和大写的 NEXT 字符。您将需要添加许多检查，以确保撇号后有一个字符，并且此代码仅适用于一个撇号。例如“玛丽·奥卡拉汉·奥康奈尔”。

$str = mb_convert_case($str, MB_CASE_TITLE, "UTF-8");
$pos = strpos($str, "'");
if ($pos != FALSE)
{
     $str[$pos+1] = strtoupper($str[$pos+1]);
}

score 0 · Accepted Answer

如果您使用的是 WordPress，请使用：

function archive_title() {
$title = '<h1>' . ucwords( single_tag_title( '', false ) )  . '</h1>';
}

php - 标题大小写包含一个或多个姓氏的字符串，同时处理带有撇号的名称

10 回答 10

Related

Reference