python - 使用 pyPDF 删除空白页的 Python 脚本

Question

我正在尝试使用 pyPDF 编写几个 python 脚本来将 PDF 页面拆分为六个单独的页面，正确排序它们（通常打印正面和背面，因此每个其他页面都需要对其子页面进行不同的排序），并删除生成的空白页面输出文档的结尾。

我编写了以下脚本来剪切 PDF 页面并重新排序。将每一页分成两列，每列分成三页。我对python不是很有经验，所以请原谅我做的不对。

#!/usr/bin/env python
import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
input = PdfFileReader(sys.stdin)
output = PdfFileWriter()

for i in range(0,input.getNumPages(),2):
    p = input.getPage(i)
    q = copy.copy(p)
    r = copy.copy(p)
    s = copy.copy(p)
    t = copy.copy(p)
    u = copy.copy(p)
    (x, y) = p.mediaBox.lowerLeft
    (w, h) = p.mediaBox.upperRight

    p.mediaBox.lowerLeft = (x, 2 * h / 3)
    p.mediaBox.upperRight = (w / 2, h)

    q.mediaBox.lowerLeft = (w / 2, 2 * h / 3)
    q.mediaBox.upperRight = (w, h)

    r.mediaBox.lowerLeft = (x, h / 3)
    r.mediaBox.upperRight = (w / 2, 2 * h / 3)

    s.mediaBox.lowerLeft = (w / 2, h / 3)
    s.mediaBox.upperRight = (w, 2 * h / 3)

    t.mediaBox.lowerLeft = (x, y)
    t.mediaBox.upperRight = (w / 2, h / 3)

    u.mediaBox.lowerLeft = (w / 2, y)
    u.mediaBox.upperRight = (w, h / 3)

    a = input.getPage(i+1)
    b = copy.copy(a)
    c = copy.copy(a)
    d = copy.copy(a)
    e = copy.copy(a)
    f = copy.copy(a)
    (x, y) = a.mediaBox.lowerLeft
    (w, h) = a.mediaBox.upperRight

    a.mediaBox.lowerLeft = (x, 2 * h / 3)
    a.mediaBox.upperRight = (w / 2, h)

    b.mediaBox.lowerLeft = (w / 2, 2 * h / 3)
    b.mediaBox.upperRight = (w, h)

    c.mediaBox.lowerLeft = (x, h / 3)
    c.mediaBox.upperRight = (w / 2, 2 * h / 3)

    d.mediaBox.lowerLeft = (w / 2, h / 3)
    d.mediaBox.upperRight = (w, 2 * h / 3)

    e.mediaBox.lowerLeft = (x, y)
    e.mediaBox.upperRight = (w / 2, h / 3)

    f.mediaBox.lowerLeft = (w / 2, y)
    f.mediaBox.upperRight = (w, h / 3)

    output.addPage(p)
    output.addPage(b)
    output.addPage(q)
    output.addPage(a)
    output.addPage(r)
    output.addPage(d)
    output.addPage(s)
    output.addPage(c)
    output.addPage(t)
    output.addPage(f)
    output.addPage(u)
    output.addPage(e)

output.write(sys.stdout)

然后我使用以下脚本删除空白页。

#!/usr/bin/env python
import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
input = PdfFileReader(sys.stdin)
output = PdfFileWriter()

for i in range(0,input.getNumPages()):
    p = input.getPage(i)

    text = p.extractText()

    if (len(text) > 10):
        output.addPage(p)

output.write(sys.stdout)

问题似乎是当页面明显被裁剪时，文本绘制命令仍然存在。这些页面都没有被扫描，所以如果它们是空白的，它们就真的是空白的。有没有人对我可以做不同的事情或可能采取完全不同的方法来删除空白页有任何想法？我真的很感激任何帮助。

score 5 · Accepted Answer

PdfFileReader有一个方法，getPage(self, page number)它返回一个对象，PageObject，它反过来有一个方法，如果页面是空白的getContents，它将返回。None因此，使用您的 pdf 对象getNumPages()进行迭代if getPage(i).getContents():，将点击收集到要输出的页码列表中。

python - 使用 pyPDF 删除空白页的 Python 脚本

1 回答 1

Related

Reference