r - 将 CSV 文件内容转换为 Markdown

Question

背景

目标是读取 CSV 文件并以 Markdown 表格格式写入内容。

该应用程序使用 R 引擎Renjin，它不支持knitr、kable或pandoc。

问题

该write.table命令有一个eol选项，但没有相应的sol选项。因此对于以下内容：

f <- read.csv('planning.csv')
write.table(
   format(f, digits=2), "",
   sep="|", row.names=F, col.names=F, quote=F, eol="|\n")

输出如下所示：

Geothermal|1250.0|Electricity|0.0|
Houses|  13.7|Shelter|4.2|
Compostor|   1.2|Recycling|0.2|

但是每一行都应该带有一个|前缀，如下所示：

|Geothermal|1250.0|Electricity|0.0|
|Houses|  13.7|Shelter|4.2|
|Compostor|   1.2|Recycling|0.2|

应该可以做类似的事情（注意额外的eol管道）：

write.table(
       format(f, digits=2), "",
       sep="|", row.names=F, col.names=F, quote=F, eol="|\n|")

然后将所有内容捕获为字符串，连接前导管道，最后修剪无关的尾管道。也就是说，修复类似于以下输出的问题：

Geothermal|1250.0|Electricity|0.0|
|Houses|  13.7|Shelter|4.2|
|Compostor|   1.2|Recycling|0.2|
|Fire Station|  -9.6|Protection|0.5|
|Roads|   0.0|Transport|0.9|
|

不过，这样的字符串操作似乎不太像 R。

问题

在不依赖第三方库的情况下，将 CSV 文件转换为 Markdown 格式的最有效方法是什么？

有问题的 Markdown 风格如下所示：

|Header|Header|Header|
|---|---|---|
|Data|Data|Data|
|Data|Data|Data|

也欢迎提示如何仅写入表头数据和表头分隔符。

score 3 · Accepted Answer

kable如果愿意，您可以编写自己的版本；它大多只是paste。

x <- read.csv(system.file('misc', 'exDIF.csv', package = 'utils'))

md_table <- function(df){
    paste0('|', paste(names(df), collapse = '|'), '|\n|', 
           paste(rep('---', length(df)), collapse = '|'), '|\n|', 
           paste(Reduce(function(x, y){paste(x, y, sep = '|')}, df), collapse = '|\n|'), '|')
}

cat(md_table(x))
#> |Var1|Var2|
#> |---|---|
#> |2.7|A|
#> |3.14|B|
#> |10|A|
#> |-7|A|

cat(md_table(head(mtcars)))
#> |mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|
#> |---|---|---|---|---|---|---|---|---|---|---|
#> |21|6|160|110|3.9|2.62|16.46|0|1|4|4|
#> |21|6|160|110|3.9|2.875|17.02|0|1|4|4|
#> |22.8|4|108|93|3.85|2.32|18.61|1|1|4|1|
#> |21.4|6|258|110|3.08|3.215|19.44|1|0|3|1|
#> |18.7|8|360|175|3.15|3.44|17.02|0|0|3|2|
#> |18.1|6|225|105|2.76|3.46|20.22|1|0|3|1|

如果您愿意，可以重写第二行以根据类型处理对齐。

score 3 · Accepted Answer

既然你想把它放到markdown中，我认为可以肯定地说表大小是可控的，所以性能不是一个因素。（编辑＃3：我有一些与行名存在相关的小错误，所以为了简化事情，我将从示例数据中完全删除它们。）

mtcars$rowname <- rownames(mtcars)
rownames(mtcars) <- NULL
mtcars <- mtcars[,c(ncol(mtcars), 1:(ncol(mtcars)-1))]
head(mtcars)
#             rowname  mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# 6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

现在的工作：

dashes <- paste(rep("---", ncol(mtcars)), collapse = "|")
txt <- capture.output(
  write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE)
)
txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1]))
head(txt2)
# [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|"  
# [2] "|---|---|---|---|---|---|---|---|---|---|---|---|"       
# [3] "|Mazda RX4|21|6|160|110|3.9|2.62|16.46|0|1|4|4|"         
# [4] "|Mazda RX4 Wag|21|6|160|110|3.9|2.875|17.02|0|1|4|4|"    
# [5] "|Datsun 710|22.8|4|108|93|3.85|2.32|18.61|1|1|4|1|"      
# [6] "|Hornet 4 Drive|21.4|6|258|110|3.08|3.215|19.44|1|0|3|1|"

如果您担心对齐，您可以检查characters（也许还有其他人，交给您）。这使用了降价表格式的对齐行：

(ischar <- vapply(mtcars, is.character, logical(1)))
# rowname     mpg     cyl    disp      hp    drat      wt    qsec      vs      am    gear    carb 
#    TRUE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE 
dashes <- paste(ifelse(ischar, ":--", "--:"), collapse = "|")
txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE))
txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1]))
head(txt2)
# [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|"  
# [2] "|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|"       
# [3] "|Mazda RX4|21|6|160|110|3.9|2.62|16.46|0|1|4|4|"         
# [4] "|Mazda RX4 Wag|21|6|160|110|3.9|2.875|17.02|0|1|4|4|"    
# [5] "|Datsun 710|22.8|4|108|93|3.85|2.32|18.61|1|1|4|1|"      
# [6] "|Hornet 4 Drive|21.4|6|258|110|3.08|3.215|19.44|1|0|3|1|"

当您最终准备好保存时，请使用cat(txt2, file = "sometable.md")（或writeLines）。

编辑＃1：请注意，其他建议的答案（包括我上面的）没有解决内容中的管道符号：

mtcars$mpg[1] <- "2|1.0"
ischar <- vapply(mtcars, is.character, logical(1))
dashes <- paste(ifelse(ischar, ":--", "--:"), collapse = "|")
txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE))
txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1]))
head(txt2, n = 3)
# [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|"
# [2] "|:--|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|"     
# [3] "|Mazda RX4|2|1.0|6|160|110|3.9|2.62|16.46|0|1|4|4|"    
###                ^ this is the problem

您可以在所有字符（或添加因子）列上手动转义它：

ischar <- vapply(mtcars, is.character, logical(1))
mtcars[ischar] <- lapply(mtcars[ischar], function(x) gsub("\\|", "&#124;", x))
dashes <- paste(ifelse(ischar, ":--", "--:"), collapse = "|")
txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE))
txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1]))
head(txt2, n = 3)
# [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|" 
# [2] "|:--|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|"      
# [3] "|Mazda RX4|2&#124;1.0|6|160|110|3.9|2.62|16.46|0|1|4|4|"
###                ^^^^^^ this is the pipe, interpreted correctly in markdown

当管道位于代码块内时，这不起作用，尽管这里建议了一种解决方法：https ://stackoverflow.com/a/17320389/3358272

在这一点上，正如@alistaire 所建议的那样，您在某种程度上重新实现了knitr::kable. 就此而言，只需抓住knitr/R/table.R) 并使用kable_markdownwhich 为您进行管道转义。它需要一个character matrix，而不是一个data.frame，所以kable_markdown(as.matrix(mtcars))。您不能只获取单个函数，因为它在该文件中也使用了多个辅助函数。您当然可以修剪一些功能，包括kable需要其他文件中的功能的自身。

编辑#2：既然你说 renjin 不支持*apply函数（评论表明这是不正确的，但为了论证我会继续），这是一个for包含对齐和 - 转义的 -loop 实现|：

mtcars$mpg[1] <- "2|1.0" # just a reminder that it's here
dashes <- rep("--:", length(mtcars))
for (i in seq_along(mtcars)) {
  if (is.character(mtcars[[i]]) || is.factor(mtcars[[i]])) {
    mtcars[[i]] <- gsub("\\|", "&#124;", mtcars[[i]])
    dashes[i] <- ":--"
  }
}
txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE))
txt2 <- sprintf("|%s|", c(txt[1], paste(dashes, collapse = "|"), txt[-1]))
head(txt2, n = 3)
# [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|" 
# [2] "|:--|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|"      
# [3] "|Mazda RX4|2&#124;1.0|6|160|110|3.9|2.62|16.46|0|1|4|4|"

作为记录，我的*apply和for-loop 实现的性能实际上是相同的，而@alistaire 的解决方案的速度是原来的两倍多（使用mtcars）：

Unit: microseconds
              expr      min        lq      mean    median        uq      max neval
     apply_noalign  917.881  947.9665 1031.9288  971.3060 1041.5050 1999.499   100
       apply_align  945.960  975.1350 1083.2856  995.7390 1063.7500 3523.101   100
 apply_align_pipes 1110.429 1148.5360 1255.5460 1176.9815 1275.2600 1905.778   100
           forloop 1188.104 1217.0950 1309.2549 1261.2205 1342.3600 2979.010   100
         alistaire  451.830  473.7105  511.5778  496.1370  518.5645  827.443   100
   alistaire_pipes  593.687  626.6900  718.6898  652.7645  700.5360 5460.970   100

我使用了他原来的函数 foralistaire并添加了一个简单gsub的 for alistaire_pipes。可能有更有效的方法来做到这一点，但是（a）简单/直接是好的，并且（b）我认为你的表足够小，真正的性能不会成为驱动力。

r - 将 CSV 文件内容转换为 Markdown

背景

问题

问题

2 回答 2

Related

Reference