1

我想使用 sed 或类似的东西来读取文本文件并将所有大写短语的实例更改为用 \textsc{ * * } 包裹的小写。

例如:

THIS SENTENCE IS ALL CAPS except not really

应该成为

\textsc{this sentence is all caps} except not really

如果

This Sentence Has Many Caps

应该留下

This Sentence Has Many Caps  

使用这种模式s/\(.[A-Z]*\)/textsc{\L\1}/,字符串只需更改第一个单词。

谁能指出我正确的方法?

更新:正则表达式模式也应涵盖撇号

I'll BUY YOU A DRINK

大多数解决方案都会分解字母I'像这样\textsc{i}'ll \textsc{buy you a} \textsc{drink}

4

3 回答 3

3
$ cat file
THIS SENTENCE IS ALL CAPS except not really
This Sentence Has Many Caps
THIS SENTENCE Has Many Caps

$ awk -f tst.awk file
\textsc{this sentence is all caps} except not really
This Sentence Has Many Caps
\textsc{this sentence} Has Many Caps

$ cat tst.awk
{
   while ( match( $0, /([[:upper:]]{2,}[[:space:]]*)+/) ) {
      rstart  = RSTART
      rlength = RLENGTH

      if ( match( substr($0,RSTART,RLENGTH), /[[:space:]]+$/) ) {
         rlength = rlength - RLENGTH
      }

      $0 = substr($0,1,rstart-1) \
           "\\textsc{" tolower(substr($0,rstart,rlength)) "}" \
           substr($0,rstart+rlength)
   }

   print
}
于 2013-02-07T13:38:42.173 回答
2

这看起来应该适合你。

echo "THIS sentence IS ALL CAPS Except not really BUT THIS IS" | \
  sed -re "s/\b(([A-Z]+ [A-Z]+)+)\b/\\\textsc{\L\1}/g"

这导致了这个短语:

THIS sentence \textsc{is all caps} Except not really \textsc{but this is}

/g是一个全局替换(不仅仅是第一个匹配)。这\b表示短语必须在单词边界上开始和结束(而不是在单词中间)。之前的三个斜线textsc是转义(转义)以产生最终的\textsc. 这([A-Z]+ [A-Z]+)+是为了捕获一个全大写的短语。我首先尝试在字符类中添加一个空格,如在 中[A-Z ],但这导致在大括号之前有一个空格,如在\text{this sentence }. 所以我在单词中间建立了空间来创建一个短语。

请注意,这会留下孤立的大写单词。我认为这是有意的,因为问题询问的是“短语”。但是,如果您还需要更换它们,请尝试以下方法:

echo "THIS sentence IS ALL CAPS Except not really BUT THIS IS" | \
  sed -re "s/\b((([A-Z]+ [A-Z]+)+)|[A-Z]+)\b/\\\textsc{\L\1}/g"

这导致

\textsc{this} sentence \textsc{is all caps} Except not really \textsc{but this is}
于 2013-02-07T07:37:58.460 回答
1

这可能对您有用(GNU sed):

sed -r 's/\b[A-Z]+\b( *\b[A-Z]+\b)*/\\textsc{\L&}/g' file
于 2013-02-07T17:16:53.477 回答