3

有时我使用 R 从 pdf 中解析文本以获取撰写文章时的引用(我使用 LATEX)。我想做的一件事是将左右引号更改为 LATEX 样式的左右引号。

LATEX 会变成"dog"``dog'' 所以两个 ` 代表左边,两个 ' 代表右边)

这是我拥有的和想要获得的示例。

#currently
x <- c('I like "proper" cooking.', 'I heard him say, "I want some too" and "nice".')

[1] "I like \"proper\" cooking."   "I heard him say, \"I want some too\" and \"nice\"."

#desired outcome
[1] "I like ``proper'' cooking."   "I heard him say, ``I want some too'' and ``nice''."

编辑:以为我会分享上下文的实际用途。使用 ttmaccer 的解决方案(适用于 Windows 机器):

g <- function(){
    require(qdap)
    x <- readClipboard()
    x <- clean(paste2(x, " "))
    zz <- mgsub(c("- ", "“", "”"), c("", "``", "''"), x)
    zz <- gsub("\"([^\"].*?)\"","``\\1''", zz)
    writeClipboard(noquote(zz), format = 1)
}

注意:qdap可以在这里下载

4

2 回答 2

3

一个天真的解决方案是:

> gsub("\"([^\"].*?)\"","``\\1''",x)

[1] "I like ``proper'' cooking."                        
[2] "I heard him say, ``I want some too'' and ``nice''."

但我不确定你会如何处理"some \"text\" with one \""

于 2012-08-14T02:01:39.980 回答
1

两阶段解决方案:

第 1 阶段:用于"((?:[^\\"]|\\.)*)"匹配双引号字符串
第 2 阶段:用于从第 1 阶段的第 1 组\\"([^\\"]*)\\"中替换\"

于 2012-08-14T01:57:25.030 回答