python - 使用 pypandoc 将 html 表转换为 docx 文件

Question

Pandoc 不能很好地将 HTML 表格呈现到 docx 文档中。我得到一个请求的内容，我使用一个模板文件来渲染它。然后我像这样使用pypandoc：

 response = render(                                     
   request,                                      
   'template.html',                      
   {                                             
     "field1": f1,                               
     "field1": f2,     
   }                                             
 )                                               

 import pypandoc                                                                                            
 pypandoc.convert(source=response.content, format='html', to='docx', outputfile='output.docx')

template.html 包含一个表格。在 docx 文件中，我得到一个表格，其内容在下面分开。是否有额外的参数需要考虑来解决这个问题？或者 pandoc 转换还不支持好表？有任何功能示例吗？也许有更简单的方法来做到这一点？

编辑 1

我提供更简洁的例子。这是一个测试python片段：

$ cat test-table.py 
#!/usr/bin/env python
test_table = """
 <p>Table with colgroup and col</p>
 <table border="1">
   <colgroup>
     <col style="background-color: #0f0">
     <col span="2">
   </colgroup>
   <tr>
     <th>Lime</th>
     <th>Lemon</th>
     <th>Orange</th>
   </tr>
   <tr>
     <td>Green</td>
     <td>Yellow</td>
     <td>Orange</td>
   </tr>
   <tr>
     <td>Fruit</td>
     <td>Fruit</td>
     <td>Fruit</td>
   </tr>
 </table>

   """
print("[test_table]")
print(test_table)
import pypandoc
pypandoc.convert(source=test_table, format='html', to='docx', outputfile='test-table.docx')  

## Write to html
with open('test-table.html', 'w') as fh:
  fh.write(test_table)

我打开html文件：

$ firefox test-table.html

并获得以下 html 页面：

这很好。我还得到以下 docx 文档：

$ libreoffice test-table.docx

这不好。

我将 docx 文件导出为 pdf 文件并得到以下输出：

$ evince test-table.pdf

请注意，我们在图像中看到的是整个页面，不可能滚动。第二列的日期根本不存在。有任何想法吗？

编辑 2

Pandoc 已安装在 conda 环境中：

$ type pandoc
pandoc is hashed (/home/kaligne/local/miniconda3/bin/pandoc)

Pandoc版本是：

$ pandoc -v
pandoc 2.2.1
Compiled with pandoc-types 1.17.4.2, texmath 0.11, skylighting 0.7.0.2
Default user data directory: /home/kaligne/.pandoc
Copyright (C) 2006-2018 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.

编辑 3 我将 docx 文件转换为 txt：

$ docx2txt test-table.docx
$ cat test-table.txt 
Table with colgroup and col
Lime
Lemon
Green
Yellow
Fruit
Fruit

我们可以看到所有数据都存在。所以我想这与信息的显示方式有关。

python - 使用 pypandoc 将 html 表转换为 docx 文件

0 回答 0

Related

Reference