c# - PDFTron：将像素转换为字体大小

Question

我有一些经过 OCR 处理的 pdf 文本。OCR 将单词的边界框返回给我。我能够绘制边界框（wordRect在 pdf 上绘制边界框 ( )，一切似乎都是正确的。

但是当我告诉我的字体大小是这些边界框的高度时，一切都出错了。文本看起来比应有的小得多，并且与高度不匹配。

我缺少一些转换。如何确保文本与边界框一样高？

pdftron.PDF.Font font = pdftron.PDF.Font.Create(convertedPdf.GetSDFDoc(), pdftron.PDF.Font.StandardType1Font.e_helvetica);
for (int j = 0; j < ocrStream.pr_WoordList.Count; j++)
{
           wordRect = (Rectangle) ocrStream.pr_Rectangles[j];

           Element textBegin = elementBuilder.CreateTextBegin();
           gStateTextRun = textBegin.GetGState();
           gStateTextRun.SetTextRenderMode(GState.TextRenderingMode.e_stroke_text);
           elementWriter.WriteElement(textBegin);

           fontSize = wordRect.Height;
           double descent;

           if (hasColorImg)
           {
               descent = (-1 * font.GetDescent() / 1000d) * fontSize;
               textRun = elementBuilder.CreateTextRun((string)ocrStream.pr_WoordList[j], font, fontSize);

              //translate the word to its correct position on the pdf

              //the bottom line of the wordrectangle is the baseline for the font, that's why we need the descender
              textRun.SetTextMatrix(1, 0, 0, 1, wordRect.Left, wordRect.Bottom + descent );

score 0 · Accepted Answer

如何确保文本与边界框一样高？

font_size 只是一个比例因子，在大多数情况下会映射到 1/72 英寸 (pt)，但并非总是如此。

转换是： GlyphSpace-> TextSpace-> UserSpace（UserSpace本质上是页面空间，并且是 1/72 英寸）

在glyphs中font定义GlyphSpace，并且有一个字体矩阵映射到TextSpace。通常，1000 个单元映射到测试空间中的 1 个单元，但并非总是如此。

然后text matrix( element.SetTextMatrix)、font size( 此处讨论的变量) 和一些附加参数，将TextSpace坐标转换为UserSpace。

最后，确切的高度也取决于字形。

此论坛帖子展示了如何从字形数据转到用户空间。请参阅ProcessElements https://groups.google.com/d/msg/pdfnet-sdk/eOATUHGFyqU/6tsUF0BHukkJ

c# - PDFTron：将像素转换为字体大小

1 回答 1

Related

Reference