我有一个小写的字符串向量。我想将它们更改为标题大小写,这意味着每个单词的第一个字母都会大写。我已经设法用双循环来做到这一点,但我希望有一种更有效和更优雅的方式来做到这一点,也许是一个带有gsub
正则表达式的单线。
这是一些示例数据,以及有效的双循环,然后是我尝试过的其他无效的东西。
strings = c("first phrase", "another phrase to convert",
"and here's another one", "last-one")
# For each string in the strings vector, find the position of each
# instance of a space followed by a letter
matches = gregexpr("\\b[a-z]+", strings)
# For each string in the strings vector, convert the first letter
# of each word to upper case
for (i in 1:length(strings)) {
# Extract the position of each regex match for the string in row i
# of the strings vector.
match.positions = matches[[i]][1:length(matches[[i]])]
# Convert the letter in each match position to upper case
for (j in 1:length(match.positions)) {
substr(strings[i], match.positions[j], match.positions[j]) =
toupper(substr(strings[i], match.positions[j], match.positions[j]))
}
}
这行得通,但它似乎异常复杂。我只是在尝试更直接的方法失败后才使用它。以下是我尝试过的一些事情以及输出:
# Google search suggested \\U might work, but evidently not in R
gsub("(\\b[a-z]+)", "\\U\\1" ,strings)
[1] "Ufirst Uphrase" "Uanother Uphrase Uto Uconvert"
[3] "Uand Uhere'Us Uanother Uone" "Ulast-Uone"
# I tried this on a lark, but to no avail
gsub("(\\b[a-z]+)", toupper("\\1"), strings)
[1] "first phrase" "another phrase to convert"
[3] "and here's another one" "last-one"
正则表达式捕获每个字符串中的正确位置,如调用所示gregexpr
,但替换字符串显然没有按预期工作。
如果你还不能说,我对正则表达式比较陌生,希望能帮助你如何让替换正常工作。我还想学习如何构建正则表达式以避免在撇号后捕获字母,因为我不想更改这些字母的大小写。