0

我正在寻找一个快速的 Bash 脚本,用于在 TeX 文档中将英国/新西兰的拼写转换为美国的拼写(用于与美国学者和期刊提交合作)。这是一篇正式的数学生物学论文,几乎没有区域术语或语法:先前的工作以公式而不是引用的形式给出。

例如,

Generalise->Generalize

Colour->Color

Centre->Centre

图必须有sedawk基于脚本来替代大多数常见的拼写差异。

有关详细信息,请参阅相关的 TeX 论坛问题。

https://tex.stackexchange.com/questions/312138/converting-uk-to-us-spellings

nb 我目前在 Ubuntu 16.04 或 Elementary OS 0.3 Freya 上编译 PDFLaTeX,kile但如果其他地方有内置修复程序,我可以使用另一个 TeX 编译器/包。

感谢你的协助。

4

2 回答 2

1

我认为您需要随身携带一份替换清单并调用它进行翻译。您必须丰富您的字典文件才能有效地翻译文本文件。

sourceFile=$1
dict=$2

while read line
    do
     word=$(echo $line |awk '{print $1}')
     updatedWord=$(grep -i $word $dict|awk '{print $2}')

     sed -i "s/$word/$updatedWord/g" $sourceFile 2 > /dev/null

   done < $dict

像这样运行上面的脚本:

./scriptName source.txt dictionary.txt 

这是我使用的一个示例字典:

>cat dict
characterize characterise
prioritize prioritise
specialize specialise
analyze analyse
catalyze catalyse
size size
exercise exercise
behavior behaviour
color colour
favor favour
contour contour
center centre
fiber fibre
liter litre
parameter parameter
ameba amoeba
anesthesia anaesthesia
diarrhea diarrhoea
esophagus oesophagus
leukemia leukaemia
cesium caesium
defense defence
practice  practice
license  licence
defensive defensive
advice  advice
aging ageing
acknowledgment acknowledgement
judgment judgement
analog analogue
dialog dialogue
fulfill fulfil
enroll enrol
skill, skillful skill, skilful
labeled labelled
signaling signalling
propelled propelled
revealing revealing

执行结果:

cat source
color of this fiber is great and we should analyze it.

./ScriptName source.txt dict.txt

cat source
colour of this fibre is great and we should analyse it.
于 2016-06-01T08:58:56.237 回答
0

这是我awk认为比sed. 这个程序。离开 LaTeX 命令(当单词以“\”开头时),它将保留单词的第一个大写字母。LaTeX 命令(和普通文本)的参数将被字典文件替换。当 [rev] 程序的第三个参数打开时,它将用同一个字典文件进行反向替换。任何非 alpha-beta 字符都可以用作单词分隔符(在 LaTeX 源文件中是必需的)。prg 将其输出写入屏幕 (stdout),因此您需要使用重定向到文件 (>output_f)。(我认为你的 LaTeX 源的 inputencoding 是 1 字节/字符。)

> cat dic.sh
#!/bin/bash
(($#<2))&& { echo "Usage $0 dictionary_file latex_file [rev]"; exit 1; }
((d= $#==3 ? 0:1))
awk -v d=$d '
 BEGIN {cm=fx=0; fn="";}
 fn!=FILENAME {fx++; fn=FILENAME;}
 fx==1 {if(!NF)next; if(d)a[$1]=$2; else a[$2]=$1; next;} #read dict or rev dict file into an associative array
 fx==2 { for(i=1; i<=length($0); i++)
            {c=substr($0,i,1);                            #read characters from a given line of LaTeX source    
             if(cm){printf("%s",c); if(c~"[^A-Za-z0-9\\\]")cm=0;}  #LaTeX command is occurred
             else if(c~"[A-Za-z]")w=w c; else{pr(); printf("%s",c); if(c=="\\")cm=1;} #collect alpha-bets or handle them
            }
         pr(); printf("\n");                              #handle collected last word in the line 
       }
function pr(  s){   # print collected word or its substitution by dictionary and recreates first letter case
   if(!length(w))return;
   s=tolower(w);
   if(!(s in a))printf("%s",w);
   else printf("%s", s==w ? a[s] : toupper(substr(a[s],1,1)) substr(a[s],2));
   w="";}
' $1 $2        

字典文件:

> cat dictionary
apple      lemon
raspberry  cherry
pear       banana

输入 LaTeX 源:

> cat src.txt
Apple123pear,apple "pear".
\Apple123pear{raspberry}{pear}[apple].

Raspberry12Apple,pear.

执行结果:

> ./dic.sh 
Usage ./dic.sh dictionary_file latex_file [rev]

> ./dic.sh dictionary src.txt >out1.txt; cat out1.txt
Lemon123banana,lemon "banana".
\Apple123pear{cherry}{banana}[lemon].

Cherry12Lemon,banana.

> ./dic.sh dictionary out1.txt >out2.txt rev; cat out2.txt
Apple123pear,apple "pear".
\Apple123pear{raspberry}{pear}[apple].

Raspberry12Apple,pear.

> diff src.txt out2.txt   # they are identical
于 2016-06-08T14:22:15.057 回答