convert()
: add option to get engine document (e.g. pikepdf.Pdf)
#203
No reviewers
Labels
No labels
bug
duplicate
enhancement
forwarded
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: josch/img2pdf#203
Loading…
Reference in a new issue
No description provided.
Delete branch "mara0004/img2pdf:return_engine_doc"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This allows the caller to get the engine document handle (e.g.
pikepdf.Pdf
) instead of saving to a file or memory buffer, which avoids overhead if you intend to continue processing the document with that engine anyway (e.g. add pages from other PDFs, or do N-up composition with the image pages).This is achieved by adding a new optionreturn_engine_doc
to theconvert()
API.Beside the document handle, this also returns the minimum output version, which may be needed for saving once post-processing has been done.
Related to #97
I'm not happy with the design of adding another function argument which then, if set, changes the output type of the function. I think a better pattern would be to factor out the relevant code into a new function which can then be called by
convert
or by the 3rd party utility which needs the underlying engine object.Fair enough, that might be cleaner.
To confirm, you only want me to factor out a new function from
pdfdoc.tostream()
but keep the option + if-check inconvert()
, right?Or do you also want me to factor out a
convert_to_enginedoc()
fromconvert()
?tostream()
8aca925ee38aca925ee3
toc1c77a7453
I think I updated the code accordingly. Kindly let me know if you are OK with it now, or if there's anything else to change.
(In case you decide to merge any of my 4 PRs, please --squash the commits into one. Thanks!)
What i had in mind was: take the convert function and put all the code that generates the
pdf
variable and put it in its own function. Let the convert function call that function. Now your code can call this new function and can do with thepdf
variable whatever it wants.Aah, that's what I was wondering in Comment 2, whether you want the factoring out only for
pdfdoc.tostream()
or also forconvert()
itself.I had presumed you meant only
pdfdoc.tostream()
, but this clarifies you actually want it forconvert()
. So I'll change that.OK, updated. Please take a look at the new patch set. Thanks :)
Thank you for your work and for your patience with me. I'm going to be on a business trip next week (I'm switching jobs end of september/beginning of october) so my next feedback might take a while but please feel free to ping me if you like.
@ -1076,3 +1076,3 @@
return stream.getvalue()
def tostream(self, outputstream):
def finalize(self):
You split
tostream()
intofinalize()
andtostream()
but then why does the newtostream()
not callfinalize()
?Because I though the embedder of
convert_to_docobject()
should not have to invokefinalize()
.Instead,
finalize()
technically belongs intoconvert_to_docobject()
itself, after all image pages have been added. Sotostream()
cannot alsofinalize()
as that would result in a double call.@ -2803,0 +2806,4 @@
# Input images can be given as file like objects (they must implement read()),
# as a binary string representing the image content or as filenames to the
# images.
def convert(*images, outputstream=None, **kwargs):
Please do not change the signature of the convert() function. This is necessary to preserve API stability. You have to extract "outputstream" from kwargs.
It's not clear to me why this is supposed to break the API -- can you explain?
AFAICS,
kwargs
is internal, and no callee expectskwargs["outputstream"]
, right?Relatedly, it looks like the
_default_kwargs
strategy silently ignores nonexistent parameters, which is problematic. (And of course it breaks IDE completion.)I see you're using kwargs to unify access of{crop,bleed,trim,art}border
, which is fine, but why can't any others params be in the signature directly?Okay, self-answering the question: a SO search yielded that specifying optional params after a
*capture
raises aSyntaxError
on Python 2. Never having written code for Python 2, I did not know that. I'll change to extract from kwargs as you said, then.The issue with invalid kwargs assumably being ignored without error still stands, though.
View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.Merge
Merge the changes and update on Forgejo.