11

我有一个小写的字符串向量。我想将它们更改为标题大小写,这意味着每个单词的第一个字母都会大写。我已经设法用双循环来做到这一点,但我希望有一种更有效和更优雅的方式来做到这一点,也许是一个带有gsub正则表达式的单线。

这是一些示例数据,以及有效的双循环,然后是我尝试过的其他无效的东西。

strings = c("first phrase", "another phrase to convert",
            "and here's another one", "last-one")

# For each string in the strings vector, find the position of each 
#  instance of a space followed by a letter
matches = gregexpr("\\b[a-z]+", strings) 

# For each string in the strings vector, convert the first letter 
#  of each word to upper case
for (i in 1:length(strings)) {

  # Extract the position of each regex match for the string in row i
  #  of the strings vector.
  match.positions = matches[[i]][1:length(matches[[i]])] 

  # Convert the letter in each match position to upper case
  for (j in 1:length(match.positions)) {

    substr(strings[i], match.positions[j], match.positions[j]) = 
      toupper(substr(strings[i], match.positions[j], match.positions[j]))
  }
}

这行得通,但它似乎异常复杂。我只是在尝试更直接的方法失败后才使用它。以下是我尝试过的一些事情以及输出:

# Google search suggested \\U might work, but evidently not in R
gsub("(\\b[a-z]+)", "\\U\\1" ,strings)
[1] "Ufirst Uphrase"                "Uanother Uphrase Uto Uconvert"
[3] "Uand Uhere'Us Uanother Uone"   "Ulast-Uone"                   

# I tried this on a lark, but to no avail
gsub("(\\b[a-z]+)", toupper("\\1"), strings)
[1] "first phrase"              "another phrase to convert"
[3] "and here's another one"    "last-one"  

正则表达式捕获每个字符串中的正确位置,如调用所示gregexpr,但替换字符串显然没有按预期工作。

如果你还不能说,我对正则表达式比较陌生,希望能帮助你如何让替换正常工作。我还想学习如何构建正则表达式以避免在撇号后捕获字母,因为我不想更改这些字母的大小写。

4

6 回答 6

21

主要问题是您丢失了perl=TRUE(并且您的正则表达式略有错误,尽管这可能是试图解决第一个问题的结果)。

如果您的代码最终在一些奇怪的(对不起,爱沙尼亚人)语言环境中运行,而不是字母表的最后一个字母,则使用[:lower:]而不是稍微安全一些......[a-z]z

re_from <- "\\b([[:lower:]])([[:lower:]]+)"
strings <- c("first phrase", "another phrase to convert",
             "and here's another one", "last-one")
gsub(re_from, "\\U\\1\\L\\2" ,strings, perl=TRUE)
## [1] "First Phrase"              "Another Phrase To Convert"
## [3] "And Here's Another One"    "Last-One"    

您可能更喜欢使用\\E(停止大写)而不是\\L(开始小写),具体取决于您要遵循的规则,例如:

string2 <- "using AIC for model selection"
gsub(re_from, "\\U\\1\\E\\2" ,string2, perl=TRUE)
## [1] "Using AIC For Model Selection"
于 2013-04-03T00:19:19.023 回答
8

如果不使用regex,帮助页面tolower有两个示例函数可以做到这一点。

更强大的版本是

capwords <- function(s, strict = FALSE) {
    cap <- function(s) paste(toupper(substring(s, 1, 1)),
                  {s <- substring(s, 2); if(strict) tolower(s) else s},
                             sep = "", collapse = " " )
    sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s)))
}
capwords(c("using AIC for model selection"))
## ->  [1] "Using AIC For Model Selection"

要使您的regex方法(几乎)起作用,您需要设置 `perl = TRUE)

gsub("(\\b[a-z]{1})", "\\U\\1" ,strings, perl=TRUE)


[1] "First Phrase"              "Another Phrase To Convert"
[3] "And Here'S Another One"    "Last-One"  

但是您可能需要更好地处理撇号

sapply(lapply(strsplit(strings, ' '), gsub, pattern = '^([[:alnum:]]{1})', replace = '\\U\\1', perl = TRUE), paste,collapse = ' ')

快速搜索 SO 发现https://stackoverflow.com/a/6365349/1385941

于 2013-04-03T00:15:40.313 回答
6

这里已经有了很好的答案。这是使用报告包中的便利功能的一个:

strings <- c("first phrase", "another phrase to convert",
    "and here's another one", "last-one")

CA(strings)

## > CA(strings)
## [1] "First Phrase"              "Another Phrase To Convert"
## [3] "And Here's Another One"    "Last-one"       

尽管它没有大写一个,因为出于我的目的这样做没有意义。

更新我管理具有 (title case) 函数的qdapRegex包,该TC函数执行真正的标题大小写:

TC(strings)

## [[1]]
## [1] "First Phrase"
## 
## [[2]]
## [1] "Another Phrase to Convert"
## 
## [[3]]
## [1] "And Here's Another One"
## 
## [[4]]
## [1] "Last-One"
于 2013-04-03T00:27:16.447 回答
4

为了好玩,我会再加入一个:

topropper(strings)
[1] "First Phrase"              "Another Phrase To Convert" "And Here's Another One"   
[4] "Last-one"  

topropper <- function(x) {
  # Makes Proper Capitalization out of a string or collection of strings. 
  sapply(x, function(strn)
   { s <- strsplit(strn, "\\s")[[1]]
       paste0(toupper(substring(s, 1,1)), 
             tolower(substring(s, 2)),
             collapse=" ")}, USE.NAMES=FALSE)
}
于 2013-04-03T04:00:27.997 回答
1

这是另一种基于stringr包装的单线:

str_to_title(strings, locale = "en")

strings你的字符串向量在哪里。

来源

于 2020-10-26T20:11:56.467 回答
0

将任何情况转换为任何其他情况的最佳方法是snakecase在 r 中使用 package。

只需使用该软件包

library(snakecase)
strings = c("first phrase", "another phrase to convert",
        "and here's another one", "last-one")

to_title_case(strings)

## [1] "First Phrase"              "Another Phrase to Convert" 
## [3] "And Here s Another One"    "Last One" 

继续编码!

于 2021-08-13T06:33:28.540 回答