javascript - 是否有支持文本选择的简约 PDF.js 示例？

Question

我正在尝试PDF.js。

我的问题是Hello World 演示不支持文本选择。它将在没有文本层的情况下在画布中绘制所有内容。官方的PDF.js 演示确实支持文本选择，但代码过于复杂。我想知道是否有人有一个带有文本层的简约演示。

score 31 · Accepted Answer

~~我已将该示例提交到 Mozilla 的 pdf.js 存储库，并且它在该examples目录下可用。~~

我提交给 pdf.js 的原始示例不再存在，但我相信这个示例展示了文本选择。他们已经清理和重组了 pdf.js，因此文本选择逻辑被封装在文本层中，可以使用工厂创建。

具体来说，PDFJS.DefaultTextLayerFactory负责设置基本的文本选择内容。

以下示例已过时；仅出于历史原因将其留在这里。

我已经在这个问题上苦苦挣扎了 2-3 天，但我终于弄明白了。这是一个小提琴，向您展示如何在启用文本选择的情况下加载 PDF。

弄清楚这一点的困难在于文本选择逻辑与查看器代码（viewer.js, viewer.html, viewer.css）交织在一起。我必须提取相关代码和 CSS 才能使其正常工作（文件中引用了该 JavaScript 文件；您也可以在此处查看）。最终结果是一个最小的演示，应该证明是有帮助的。为了正确实现选择，其中的 CSSviewer.css也非常重要，因为它为最终创建的 s 设置 CSS 样式，div然后用于使文本选择正常工作。

繁重的工作由TextLayerBuilder对象完成，它实际上处理了 selectiondiv的创建。您可以从内部看到对该对象的调用viewer.js。

无论如何，这是包含 CSS 的代码。请记住，您仍然需要该pdf.js文件。我的小提琴有一个链接，指向我从 Mozilla 的 GitHub 存储库为pdf.js. 我不想直接链接到 repo 的版本，因为他们一直在开发它，它可能会被破坏。

所以事不宜迟：

HTML：

<html>
    <head>
        <title>Minimal pdf.js text-selection demo</title>
    </head>

    <body>
        <div id="pdfContainer" class = "pdf-content">
        </div>
    </body>
</html>

CSS：

.pdf-content {
    border: 1px solid #000000;
}

/* CSS classes used by TextLayerBuilder to style the text layer divs */

/* This stuff is important! Otherwise when you select the text, the text in the divs will show up! */
::selection { background:rgba(0,0,255,0.3); }
::-moz-selection { background:rgba(0,0,255,0.3); }

.textLayer {
    position: absolute;
    left: 0;
    top: 0;
    right: 0;
    bottom: 0;
    color: #000;
    font-family: sans-serif;
    overflow: hidden;
}

.textLayer > div {
    color: transparent;
    position: absolute;
    line-height: 1;
    white-space: pre;
    cursor: text;
}

.textLayer .highlight {
    margin: -1px;
    padding: 1px;

    background-color: rgba(180, 0, 170, 0.2);
    border-radius: 4px;
}

.textLayer .highlight.begin {
    border-radius: 4px 0px 0px 4px;
}

.textLayer .highlight.end {
    border-radius: 0px 4px 4px 0px;
}

.textLayer .highlight.middle {
    border-radius: 0px;
}

.textLayer .highlight.selected {
    background-color: rgba(0, 100, 0, 0.2);
}

JavaScript：

//Minimal PDF rendering and text-selection example using pdf.js by Vivin Suresh Paliath (http://vivin.net)
//This fiddle uses a built version of pdf.js that contains all modules that it requires.
//
//For demonstration purposes, the PDF data is not going to be obtained from an outside source. I will be
//storing it in a variable. Mozilla's viewer does support PDF uploads but I haven't really gone through
//that code. There are other ways to upload PDF data. For instance, I have a Spring app that accepts a
//PDF for upload and then communicates the binary data back to the page as base64. I then convert this
//into a Uint8Array manually. I will be demonstrating the same technique here. What matters most here is
//how we render the PDF with text-selection enabled. The source of the PDF is not important; just assume
//that we have the data as base64.
//
//The problem with understanding text selection was that the text selection code has heavily intertwined
//with viewer.html and viewer.js. I have extracted the parts I need out of viewer.js into a separate file
//which contains the bare minimum required to implement text selection. The key component is TextLayerBuilder,
//which is the object that handles the creation of text-selection divs. I have added this code as an external
//resource.
//
//This demo uses a PDF that only has one page. You can render other pages if you wish, but the focus here is
//just to show you how you can render a PDF with text selection. Hence the code only loads up one page.
//
//The CSS used here is also very important since it sets up the CSS for the text layer divs overlays that
//you actually end up selecting. 
//
//For reference, the actual PDF document that is rendered is available at:
//http://vivin.net/pub/pdfjs/TestDocument.pdf

var pdfBase64 = "..."; //should contain base64 representing the PDF

var scale = 1; //Set this to whatever you want. This is basically the "zoom" factor for the PDF.

/**
 * Converts a base64 string into a Uint8Array
 */
function base64ToUint8Array(base64) {
    var raw = atob(base64); //This is a native function that decodes a base64-encoded string.
    var uint8Array = new Uint8Array(new ArrayBuffer(raw.length));
    for(var i = 0; i < raw.length; i++) {
        uint8Array[i] = raw.charCodeAt(i);
    }

    return uint8Array;
}

function loadPdf(pdfData) {
    PDFJS.disableWorker = true; //Not using web workers. Not disabling results in an error. This line is
                                //missing in the example code for rendering a pdf.

    var pdf = PDFJS.getDocument(pdfData);
    pdf.then(renderPdf);                               
}

function renderPdf(pdf) {
    pdf.getPage(1).then(renderPage);
}

function renderPage(page) {
    var viewport = page.getViewport(scale);
    var $canvas = jQuery("<canvas></canvas>");

    //Set the canvas height and width to the height and width of the viewport
    var canvas = $canvas.get(0);
    var context = canvas.getContext("2d");
    canvas.height = viewport.height;
    canvas.width = viewport.width;

    //Append the canvas to the pdf container div
    jQuery("#pdfContainer").append($canvas);

    //The following few lines of code set up scaling on the context if we are on a HiDPI display
    var outputScale = getOutputScale();
    if (outputScale.scaled) {
        var cssScale = 'scale(' + (1 / outputScale.sx) + ', ' +
            (1 / outputScale.sy) + ')';
        CustomStyle.setProp('transform', canvas, cssScale);
        CustomStyle.setProp('transformOrigin', canvas, '0% 0%');

        if ($textLayerDiv.get(0)) {
            CustomStyle.setProp('transform', $textLayerDiv.get(0), cssScale);
            CustomStyle.setProp('transformOrigin', $textLayerDiv.get(0), '0% 0%');
        }
    }

    context._scaleX = outputScale.sx;
    context._scaleY = outputScale.sy;
    if (outputScale.scaled) {
        context.scale(outputScale.sx, outputScale.sy);
    }     

    var canvasOffset = $canvas.offset();
    var $textLayerDiv = jQuery("<div />")
        .addClass("textLayer")
        .css("height", viewport.height + "px")
        .css("width", viewport.width + "px")
        .offset({
            top: canvasOffset.top,
            left: canvasOffset.left
        });

    jQuery("#pdfContainer").append($textLayerDiv);

    page.getTextContent().then(function(textContent) {
        var textLayer = new TextLayerBuilder($textLayerDiv.get(0), 0); //The second zero is an index identifying
                                                                       //the page. It is set to page.number - 1.
        textLayer.setTextContent(textContent);

        var renderContext = {
            canvasContext: context,
            viewport: viewport,
            textLayer: textLayer
        };

        page.render(renderContext);
    });
}

var pdfData = base64ToUint8Array(pdfBase64);
loadPdf(pdfData);

score 4 · Accepted Answer

因为这是一个旧问题和旧的公认答案，要使其与最新的 PDF.JS 版本一起使用，您可以使用此解决方案

http://www.ryzhak.com/converting-pdf-file-to-html-canvas-with-text-selection-using-pdf-js

这是他们使用的代码：包括来自 PDF.js 代码的以下 CSS 和脚本

<link rel="stylesheet" href="pdf.js/web/text_layer_builder.css" />
<script src="pdf.js/web/ui_utils.js"></script>
<script src="pdf.js/web/text_layer_builder.js"></script>

使用此代码加载 PDF：

PDFJS.getDocument("oasis.pdf").then(function(pdf){
    var page_num = 1;
    pdf.getPage(page_num).then(function(page){
        var scale = 1.5;
        var viewport = page.getViewport(scale);
        var canvas = $('#the-canvas')[0];
        var context = canvas.getContext('2d');
        canvas.height = viewport.height;
        canvas.width = viewport.width;

        var canvasOffset = $(canvas).offset();
        var $textLayerDiv = $('#text-layer').css({
            height : viewport.height+'px',
            width : viewport.width+'px',
            top : canvasOffset.top,
            left : canvasOffset.left
        });

        page.render({
            canvasContext : context,
            viewport : viewport
        });

        page.getTextContent().then(function(textContent){
           console.log( textContent );
            var textLayer = new TextLayerBuilder({
                textLayerDiv : $textLayerDiv.get(0),
                pageIndex : page_num - 1,
                viewport : viewport
            });

            textLayer.setTextContent(textContent);
            textLayer.render();
        });
    });
});

score 1 · Accepted Answer

如果您想通过文本选择在不同页面中呈现 pdf 文档的所有页面，您可以使用

pdf查看器
画布和渲染器解析文本并将其附加到画布上，使其看起来像文本选择。

但在实际场景中，如果您要使用画布进行放大/缩小等处理，那么此画布操作将严重降低您的浏览器性能。请检查以下网址，

http://learnnewhere.unaux.com/pdfViewer/viewer.html

您可以从这里获取完整代码 https://github.com/learnnewhere/simpleChatApp/tree/master/pdfViewer

javascript - 是否有支持文本选择的简约 PDF.js 示例？

3 回答 3

Related

Reference