问题标签 [pdf2htmlex]

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

0 投票
0 回答
19 浏览

cairo - pdf2htmlEX - CairoFontEngine.cc error during ./dobuild

I'm trying to build a docker file which uses pdf2htmlEX-0.18.7-poppler-0.81.0 but I keep getting CairoFontEngine Errors, here's my dockerfile:

When I build the docker image I get the following errors:

/tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc: In member function 'virtual bool CairoFont::matches(Ref&, bool)': /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:83:17: error: no match for 'operator==' (operand types are 'Ref' and 'Ref') return (other == ref); ~~~~~~^~~~~~ /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc: In static member function 'static CairoFreeTypeFont* CairoFreeTypeFont::create(GfxFont*, XRef*, FT_Library, bool)': /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:420:47: error: 'class GooString' has no member named 'c_str' gfxFont->getName() ? gfxFont->getName()->c_str() ^~~~~ /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:439:27: error: 'class GooString' has no member named 'c_str' fileNameC = fileName->c_str(); ^~~~~ /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:488:42: error: invalid conversion from 'const char*' to 'char*' [-fpermissive] ff = FoFiTrueType::load(fileNameC); ^ In file included from /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:45:0: /usr/include/poppler/fofi/FoFiTrueType.h:53:24: note: initializing argument 1 of 'static FoFiTrueType* FoFiTrueType::load(char*, int)' static FoFiTrueType load(char fileName, int faceIndexA=0); ^~~~ /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:502:40: error: invalid conversion from 'const char' to 'char' [-fpermissive] ff = FoFiTrueType::load(fileNameC); ^ In file included from /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:45:0: /usr/include/poppler/fofi/FoFiTrueType.h:53:24: note: initializing argument 1 of 'static FoFiTrueType* FoFiTrueType::load(char*, int)' static FoFiTrueType load(char fileName, int faceIndexA=0); ^~~~ /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:531:42: error: invalid conversion from 'const char' to 'char' [-fpermissive] ff1c = FoFiType1C::load(fileNameC); ^ In file included from /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:46:0: /usr/include/poppler/fofi/FoFiType1C.h:154:22: note: initializing argument 1 of 'static FoFiType1C* FoFiType1C::load(char*)' static FoFiType1C load(char fileName); ^~~~ /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:563:37: error: invalid conversion from 'const char' to 'char' [-fpermissive] ff = FoFiTrueType::load(fileNameC); ^ In file included from /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:45:0: /usr/include/poppler/fofi/FoFiTrueType.h:53:24: note: initializing argument 1 of 'static FoFiTrueType* FoFiTrueType::load(char*, int)' static FoFiTrueType load(char fileName, int faceIndexA=0); ^~~~ /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc: In function 'cairo_status_t _render_type3_glyph(cairo_scaled_font_t, long unsigned int, cairo_t, cairo_text_extents_t*)': /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:700:37: error: no matching function for call to 'Dict::getVal(long unsigned int&)' charProc = charProcs->getVal(glyph); ^ In file included from /usr/include/poppler/Object.h:314:0, from /usr/include/poppler/GfxFont.h:41, from /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.h:38, from /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:42: /usr/include/poppler/Dict.h:85:11: note: candidate: Object* Dict::getVal(int, Object*) Object *getVal(int i, Object *obj); ^~~~~~ /usr/include/poppler/Dict.h:85:11: note: candidate expects 2 arguments, 1 provided /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc: In member function 'virtual bool CairoType3Font::matches(Ref&, bool)': /tmp/pdf2htmlEX-0.18.7-poppler-0.81.0/3rdparty/poppler/git/CairoFontEngine.cc:786:17: error: no match for 'operator==' (operand types are 'Ref' and 'Ref') return (other == ref && printing == printingA); ~~~~~~^~~~~~ CMakeFiles/pdf2htmlEX.dir/build.make:62: recipe for target 'CMakeFiles/pdf2htmlEX.dir/3rdparty/poppler/git/CairoFontEngine.cc.o' failed make[2]: *** [CMakeFiles/pdf2htmlEX.dir/3rdparty/poppler/git/CairoFontEngine.cc.o] Error 1 CMakeFiles/Makefile2:355: recipe for target 'CMakeFiles/pdf2htmlEX.dir/all' failed make[1]: *** [CMakeFiles/pdf2htmlEX.dir/all] Error 2 Makefile:138: recipe for target 'all' failed make: *** [all] Error 2

Please advise how can I resolve this.

0 投票
0 回答
22 浏览

pdf2htmlex - 转换过程中的 pdf2htmlEX 错误 - CMap 无效并因字体而被删除

我正在使用这个版本https://github.com/pdf2htmlEX/pdf2htmlEX/releases/tag/v0.18.8.rc1

这个debian版本pdf2htmlEX-0.18.8.rc1-master-20200630-Ubuntu-focal-x86_64.deb

当我运行转换时,我得到了一堆这些错误: Working: 97/100ToUnicode CMap is not valid and got dropped for font: b7

这导致空文件,没有任何文本。

我正在通过 docker 运行,这是我的 dockerfile:

请告知我该如何解决这个问题?

0 投票
0 回答
116 浏览

node.js - pdf2htmlEX 的推荐替代品

我现在使用 pdf2htmlEX 有一段时间了,在多次升级后,我决定寻找替代品。

当前工具

https://github.com/pdf2htmlEX/pdf2htmlEX

认为值得一提的是,我在 Node 上运行并将 pdf2htmlEX 作为子进程生成。

我们在使用此工具时遇到的一些问题是:

  • 一些 pdf 字体丢失,而是[]出现,这迫使我使用该页面中的图像作为后备。
  • 新的 pdf 文件无法转换并出现错误,pdftotext其中使用的工具poppler是 pdf2htmlEX 的一部分
  • 文本有时包括在复制粘贴用例期间复制的其他字符

是否在网上进行了一些研究,但无法确定哪个工具更适合给我提供与 pdf2htmlEX 相同(甚至更好的结果)的结果?

请指教

0 投票
0 回答
46 浏览

pdf - 使用 poppler 生成的 XML 中的坐标来构建电子邮件模板

从此PDF生成 72 dpi图像和缩放为 1 的XML

尽管 DPI 为 72,但为了能够将 XML 中的坐标转换为像素,必须使用此反复调整 DPI 。90.5 似乎运作良好。但是,这看起来不像是正确的方法。

生成 XML 的命令: pdftohtml -xml -zoom 1 -fontfullname -s -c input.pdf output

生成图像的命令: pdftoppm -jpeg -r 72 input.pdf output

注意:生成图像时使用了 72 dpi,因为在 72 dpi 中输出的图像与 PDF 和 XML 输出的尺寸相似。

这种转换是必不可少的,因为这将允许构建 HTML。我知道 poppler 本身可以生成 HTML,但是,由于生成的 HTML 需要与电子邮件兼容,因此 XML 被用于从头开始构建 HTML。

XML 到 PDF 中坐标的转换可以通过哪些方式更可靠地完成?

0 投票
0 回答
5 浏览

pdf2htmlex - pdf2htmlEX - 打开后备选项的转换不起作用

我正在使用pdf2htmlEX-0.18.8.rc1-master-20200630-Ubuntu-focal-x86_64.deb 并且我尝试在打开后备选项的情况下运行该工具,但每次它都会导致空白页面。

尝试了各种参数配置,但每次我得到相同的结果。

请告知使用哪些参数可以在打开后备选项的情况下运行该工具。