0

I'm learning how to program using the "automate the boring stuff"-book, I have, however stumbled upon a roadblock in chapter 13. "Merge multiple PDF's, but omit the title page from all but the first page"

In the book, they do it by looping over the PDF, however, while looking into PyPDF2 module, I found the 'pages'-option a cleaner solution. I, however, have difficulties to get this to work.

Don't look yet if it is pythonic or something. I still haven't learned classes ;-) After the book, I plan on starting with classes, objects, decorators, *args and **kwargs ;-)

I need help in the last line of code of my snippet.

My code:

  for fn_PdfObjects in range(len(l_fn_PdfObjects)):
if fn_PdfObjects != 0:
     break
else:
  ## watermark the first sheet
  addWatermark(l_fn_PdfObjects[fn_PdfObjects])
  watermarkedPage = PyPDF2.PdfFileReader(open('watermarkedCover.pdf', 'rb'))
  #   the 'position = ' is the page in the destination PDF it will receive
  tempMergerFile.merge(position=fn_PdfObjects, fileobj=watermarkedPage)
  tempMergerFile.merge(position=fn_PdfObjects+1, fileobj=l_fn_PdfObjects[fn_PdfObjects],pages='0:')

When looking at the module, I find this: src: https://pythonhosted.org/PyPDF2/PdfFileMerger.html

merge(position, fileobj, bookmark=None, pages=None, import_bookmarks=True)

pages – can be a Page Range or a (start, stop[, step]) tuple to merge only the specified range of pages from the source document into the output document.

I also found this about page_ranges, but whatever I try, I can't get it to work: src: https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/pagerange.py

class PageRange(object):
"""
A slice-like representation of a range of page indices,
    i.e. page numbers, only starting at zero.
The syntax is like what you would put between brackets [ ].
The slice is one of the few Python types that can't be subclassed,
but this class converts to and from slices, and allows similar use.
  o  PageRange(str) parses a string representing a page range.
  o  PageRange(slice) directly "imports" a slice.
  o  to_slice() gives the equivalent slice.
  o  str() and repr() allow printing.
  o  indices(n) is like slice.indices(n).
"""

def __init__(self, arg):
    """
    Initialize with either a slice -- giving the equivalent page range,
    or a PageRange object -- making a copy,
    or a string like
        "int", "[int]:[int]" or "[int]:[int]:[int]",
        where the brackets indicate optional ints.
    {page_range_help}
    Note the difference between this notation and arguments to slice():
        slice(3) means the first three pages;
        PageRange("3") means the range of only the fourth page.
        However PageRange(slice(3)) means the first three pages.
    """

The error is receive is the following: TypeError: "pages" must be a tuple of (start, stop[, step])

    Traceback (most recent call last):
File "combining_select_pages_from_many_pdfs.py", line 112, in <module>
main() 
File "combining_select_pages_from_many_pdfs.py", line 104, in main
newPdfFile = mergePdfFiles(l_PdfObjects)
File "combining_select_pages_from_many_pdfs.py", line 63, in mergePdfFiles
tempMergerFile.merge(position=fn_PdfObjects+1, fileobj=l_fn_PdfObjects[fn_PdfObjects],pages=[0])
File "/home/sybie/.local/lib/python3.5/site-packages/PyPDF2/merger.py", line 143, in merge
raise TypeError('"pages" must be a tuple of (start, stop[, step])')

What I can find about this is:

# Find the range of pages to merge.
    if pages == None:
        pages = (0, pdfr.getNumPages())
    elif isinstance(pages, PageRange):
        pages = pages.indices(pdfr.getNumPages())
    elif not isinstance(pages, tuple):
        raise TypeError('"pages" must be a tuple of (start, stop[, step])')

src: https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/merger.py#L137

Thanks in advance for all the help!

4

2 回答 2

0

I resolved the issue by doing this:

pages=(1,l_fn_PdfObjects[fn_PdfObjects].numPages)

in fact, I made it into a tuple. If anyone still could tell me how the page-range works, I would appreciate it!

于 2016-04-18T07:32:43.863 回答
0

It seems that you have to use the parse_filename_page_ranges function. It would roughly look like this:

from PyPDF2 import PdfFileMerger, parse_filename_page_ranges
args=[records_pdf,'0:1',inv_pdf,records_pdf,'1:']
filename_page_ranges = parse_filename_page_ranges(args.fn_pgrgs)

output = open(destinationfile, "wb")

merger = PdfFileMerger()
in_fs = dict()
try:
    for (filename, page_range) in filename_page_ranges:
        if filename not in in_fs:
            in_fs[filename] = open(filename, "rb")
        merger.append(in_fs[filename], pages=page_range)
except:
    print(traceback.format_exc(), file=stderr)
    print("Error while reading " + filename, file=stderr)
    exit(1)
merger.write(output)
于 2017-11-29T04:10:00.073 回答