regex - 使用正则表达式在 LaTeX 表中查找（并替换）第 n 列

Question

我有一个字符串，它是一个 LaTeX 表。我试图找到第 n 列（假设是第三列）并将所有内容包装在里面，比如\emph{}不匹配分隔的美元符号。

我正在寻找第一&...&列，即第二列。然后找到下一个&...&，这是第二个分组，绝非巧合的是表中的第三列。

我的虚拟示例有效，但有点不同，因为它在两个&...&. 我将在稍后阶段解决一些小问题 - 我需要使用后向和前向引用&将呼叫置于呼叫之外。\emph{}

xy <-  "This is &more or less& a match and here is &another one&.\nSecond line with &occurrance 1& and &occurrance 2&"
gsub("(&.*?&)|(.*?&)(.*)(&.*?&)", "\\1\\2\\3\\\\emph{\\4}", xy, perl = TRUE)
[1] "This is &more or less& a match and here is \\emph{&another one&}.\nSecond line with &occurrance 1& and \\emph{&occurrance 2&}"

当我将它提升到带有 LaTeX 表的读取集时（砰！），它有点不同。两个之间没有字符&...&，这意味着一个&与两列相邻。考虑到这一点，我删除了(.*). 无论我尝试什么，我都无法让它工作。有小费吗？

library(xtable)
data(tli)
tli.table <- xtable(tli[1:5,])
x <- print.xtable(tli.table, print.results = FALSE, include.rownames = FALSE)

cat(x)
% latex table generated in R 2.15.1 by xtable 1.7-0 package
% Thu Jul 26 14:13:39 2012
\begin{table}[ht]
\begin{center}
\begin{tabular}{rlllr}
  \hline
grade & sex & disadvg & ethnicty & tlimth \\ 
  \hline
  6 & M & YES & HISPANIC &  43 \\ 
    7 & M & NO & BLACK &  88 \\ 
    5 & F & YES & HISPANIC &  34 \\ 
    3 & M & YES & HISPANIC &  65 \\ 
    8 & M & YES & WHITE &  75 \\ 
   \hline
\end{tabular}
\end{center}
\end{table}

gsub("(&.*?&)(&.*?&)", "\\1\\\\emph{\\2}", x, perl = TRUE)

score 4 · Accepted Answer

假设第 1^列是n <- 1（而不是n <- 0），您应该用于替换第 n 列的正则表达式应该是：

(?m)^(?=[^&\n\r]*&)((?:[^&]*&){n-1})\\s*([^&]*?)\\s*(&|\\\\)
                                ↑
                                └ replace this n-1 with real number

然后替换字符串必须是\\1\\\\emph{\\2}\\3.

所以你的替换代码是：

input <- "% latex table generated in R 2.15.1 by xtable 1.7-0 package\n% Thu Jul 26 17:49:09 2012\n\\begin{table}[ht]\n\\begin{center}\n\\begin{tabular}{rlllr}\n  \\hline\ngrade & sex & disadvg & ethnicty & tlimth \\\\ \n  \\hline\n  6 & M & YES & HISPANIC &  43 \\\\ \n    7 & M & NO & BLACK &  88 \\\\ \n    5 & F & YES & HISPANIC &  34 \\\\ \n    3 & M & YES & HISPANIC &  65 \\\\ \n    8 & M & YES & WHITE &  75 \\\\ \n   \\hline\n\\end{tabular}\n\\end{center}\n\\end{table}\n"

n <- 1
regex <- paste(c('(?m)^(?=[^&\n\r]*&)((?:[^&]*&){', n-1, '})\\s*([^&]*?)\\s*(&|\\\\)'), collapse='')
cat(gsub(regex, "\\1\\\\emph{\\2}\\3", input, perl = TRUE))

score 2 · Accepted Answer

另一种方法是emph{}在调用 xtable 之前包装您的列：

data(tli)
tli[, 4] <- paste0("\\\\emph{", tli[, 4], "}")

然后你的脚本就像你一样：

tli.table <- xtable(tli[1:5,])
x <- print.xtable(tli.table, print.results = FALSE, include.rownames = FALSE)
cat(x)

产生以下内容，这应该会产生预期的结果：

% latex table generated in R 2.15.0 by xtable 1.7-0 package
% Thu Jul 26 16:08:58 2012
\begin{table}[ht]
\begin{center}
\begin{tabular}{rlllr}
  \hline
grade & sex & disadvg & ethnicty & tlimth \\ 
  \hline
  6 & M & YES & $\backslash$$\backslash$emph\{HISPANIC\} &  43 \\ 
    7 & M & NO & $\backslash$$\backslash$emph\{BLACK\} &  88 \\ 
    5 & F & YES & $\backslash$$\backslash$emph\{HISPANIC\} &  34 \\ 
    3 & M & YES & $\backslash$$\backslash$emph\{HISPANIC\} &  65 \\ 
    8 & M & YES & $\backslash$$\backslash$emph\{WHITE\} &  75 \\ 
   \hline
\end{tabular}
\end{center}
\end{table}

regex - 使用正则表达式在 LaTeX 表中查找（并替换）第 n 列

2 回答 2

Related

Reference