convert(): add option to get engine document (e.g. pikepdf.Pdf) #203

Open
mara0004 wants to merge 7 commits from mara0004/img2pdf:return_engine_doc into main
Showing only changes of commit c1c77a7453 - Show all commits

View file

@ -1075,7 +1075,7 @@ class pdfdoc(object):
self.tostream(stream)
return stream.getvalue()
def tostream(self, outputstream=None, return_engine_doc=False):
def todoc(self):
Review

You split tostream() into finalize() and tostream() but then why does the new tostream() not call finalize()?

You split `tostream()` into `finalize()` and `tostream()` but then why does the new `tostream()` not call `finalize()`?
Review

Because I though the embedder of convert_to_docobject() should not have to invoke finalize().
Instead, finalize() technically belongs into convert_to_docobject() itself, after all image pages have been added. So tostream() cannot also finalize() as that would result in a double call.

Because I though the embedder of `convert_to_docobject()` should not have to invoke `finalize()`. Instead, `finalize()` technically belongs into `convert_to_docobject()` itself, after all image pages have been added. So `tostream()` cannot also `finalize()` as that would result in a double call.
if self.engine == Engine.pikepdf:
PdfArray = pikepdf.Array
PdfDict = pikepdf.Dictionary
@ -1267,12 +1267,11 @@ class pdfdoc(object):
self.writer.addobj(metadata)
self.writer.addobj(iccstream)
if return_engine_doc:
return self.writer, self.output_version
return self.writer, self.output_version
# now write out the PDF
if outputstream is None:
raise TypeError("pdfdoc.tostream() requires outputstream unless return_engine_doc is True.")
def tostream(self, outputstream):
# write out the PDF
self.todoc() # finalize self.writer
if self.engine == Engine.pikepdf:
kwargs = {}
if pikepdf.__version__ >= "6.2.0":
@ -1281,6 +1280,7 @@ class pdfdoc(object):
outputstream, min_version=self.output_version, linearize=True, **kwargs
)
elif self.engine == Engine.pdfrw:
from pdfrw import PdfName, PdfArray
self.writer.trailer.Info = self.writer.docinfo
# setting the version attribute of the pdfrw PdfWriter object will
# influence the behaviour of the write() function
@ -2803,11 +2803,11 @@ def convert(*images, **kwargs):
)
if kwargs["outputstream"]:
pdf.tostream(outputstream=kwargs["outputstream"])
pdf.tostream(kwargs["outputstream"])
return
if kwargs["return_engine_doc"]:
Review

Please do not change the signature of the convert() function. This is necessary to preserve API stability. You have to extract "outputstream" from kwargs.

Please do not change the signature of the convert() function. This is necessary to preserve API stability. You have to extract "outputstream" from kwargs.
Review

It's not clear to me why this is supposed to break the API -- can you explain?
AFAICS, kwargs is internal, and no callee expects kwargs["outputstream"], right?

It's not clear to me why this is supposed to break the API -- can you explain? AFAICS, `kwargs` is internal, and no callee expects `kwargs["outputstream"]`, right?
Review

Relatedly, it looks like the _default_kwargs strategy silently ignores nonexistent parameters, which is problematic. (And of course it breaks IDE completion.)
I see you're using kwargs to unify access of {crop,bleed,trim,art}border, which is fine, but why can't any others params be in the signature directly?

Relatedly, it looks like the `_default_kwargs` strategy silently ignores nonexistent parameters, which is problematic. (And of course it breaks IDE completion.) ~~I see you're using kwargs to unify access of `{crop,bleed,trim,art}border`, which is fine, but why can't any others params be in the signature directly?~~
Review

It's not clear to me why this is supposed to break the API -- can you explain?

Okay, self-answering the question: a SO search yielded that specifying optional params after a *capture raises a SyntaxError on Python 2. Never having written code for Python 2, I did not know that. I'll change to extract from kwargs as you said, then.

The issue with invalid kwargs assumably being ignored without error still stands, though.

> It's not clear to me why this is supposed to break the API -- can you explain? Okay, self-answering the question: a SO search yielded that specifying optional params after a `*capture` raises a `SyntaxError` on Python 2. Never having written code for Python 2, I did not know that. I'll change to extract from kwargs as you said, then. The issue with invalid kwargs assumably being ignored without error still stands, though.
return pdf.tostream(return_engine_doc=True)
return pdf.todoc()
return pdf.tostring()