Limit on the input file size? #139

Closed
opened 2 years ago by gismatthew · 7 comments

I am trying to assemble 1000 png images into a pdf. The total file size is around 550MB. The script will quit every time at convert. I limited input images to the first 100 and tried again, and it worked. So my questions is: what is the known limit on the input either in terms of number of images or the total file size? Once I know these parameters, I can work on some measures, e.g. (1) split into multiple pdfs or (2) reduce the image's resolution.

I am trying to assemble 1000 png images into a pdf. The total file size is around 550MB. The script will quit every time at `convert`. I limited input images to the first 100 and tried again, and it worked. So my questions is: what is the known limit on the input either in terms of number of images or the total file size? Once I know these parameters, I can work on some measures, e.g. (1) split into multiple pdfs or (2) reduce the image's resolution.
josch commented 2 years ago
Owner

When you say "quit" -- what is the error message you get? Without more details I'm afraid I cannot know what is wrong.

There is certainly no artificial limit in the img2pdf code that aborts if more than X input files are given. If that were the case, you would've gotten an appropriate error message.

So what is the error message you get?

When you say "quit" -- what is the error message you get? Without more details I'm afraid I cannot know what is wrong. There is certainly no artificial limit in the img2pdf code that aborts if more than X input files are given. If that were the case, you would've gotten an appropriate error message. So what is the error message you get?

I agree we would need to see the traceback to find out what the issue is.
Concerning the first part of your question: There is a limit on maximum input image size in pixels imposed by the Python Imaging Library to avoid decompression bomb attacks. You can find out what that limit is by running

python3 -c "import PIL.Image; print(PIL.Image.MAX_IMAGE_PIXELS)"

It is possible to disable this safeguard by passing --pillow-limit-break.

I agree we would need to see the traceback to find out what the issue is. Concerning the first part of your question: There is a limit on maximum input image size in pixels imposed by the Python Imaging Library to avoid decompression bomb attacks. You can find out what that limit is by running ```bash python3 -c "import PIL.Image; print(PIL.Image.MAX_IMAGE_PIXELS)" ``` It is possible to disable this safeguard by passing `--pillow-limit-break`.
Poster

Thanks for looking into this issue! I wrote a python script to do the conversion. I also tried to run line by line, the convert() will cause python to quit to back to the terminal

# run line by line (Python 3.7.8, on Windows 10)
# now at `convert`:
d=img2pdf.convert(imageFiles)
(venv) C:\my\workspace>

Image.MAX_IMAGE_PIXELS = None does not resolve the issue.

I tried using img2pdf.exe, and it worked! I get a valid pdf but the pages are not in the correct order.

Since the executable file is not flexible enough to sort the images (pages of my scanned book), I still count on you guys to figure out why python cannot handle it. Let me know if you need more info to troubleshoot the issue, though I know that I'm not being helpful at the moment.

Thanks very much!

Thanks for looking into this issue! I wrote a python script to do the conversion. I also tried to run line by line, the `convert()` will cause python to quit to back to the terminal ```python # run line by line (Python 3.7.8, on Windows 10) # now at `convert`: d=img2pdf.convert(imageFiles) (venv) C:\my\workspace> ``` `Image.MAX_IMAGE_PIXELS = None` does not resolve the issue. I tried using img2pdf.exe, and it worked! I get a valid pdf but the pages are not in the correct order. Since the executable file is not flexible enough to sort the images (pages of my scanned book), I still count on you guys to figure out why python cannot handle it. Let me know if you need more info to troubleshoot the issue, though I know that I'm not being helpful at the moment. Thanks very much!
josch commented 2 years ago
Owner

Do I see this correctly, that when running img2pdf.convert() you get no output and are just thrown out of your python interpreter? I've never seen that. I have no idea why you do not get any sort of error message.

It's even more weird that it works with img2pdf.exe with the same input.

Currently I don't see how I can further debug the problem because I cannot reproduce it.

Do I see this correctly, that when running `img2pdf.convert()` you get no output and are just thrown out of your python interpreter? I've never seen that. I have no idea why you do not get any sort of error message. It's even more weird that it works with `img2pdf.exe` with the same input. Currently I don't see how I can further debug the problem because I cannot reproduce it.

Concur, I've never seen any such behaviour with img2pdf.
What you could yet try would be to set the logger to debug mode and check whether there is any output that would tell where approximately the library stops.

import logging
import img2pdf

lib_logger = logging.getLogger('img2pdf')
lib_logger.addHandler(logging.StreamHandler())
lib_logger.setLevel(logging.DEBUG)

# ... your code here
Concur, I've never seen any such behaviour with img2pdf. What you could yet try would be to set the logger to debug mode and check whether there is any output that would tell where approximately the library stops. ```python3 import logging import img2pdf lib_logger = logging.getLogger('img2pdf') lib_logger.addHandler(logging.StreamHandler()) lib_logger.setLevel(logging.DEBUG) # ... your code here ```

If you are willing to share your conversion script, this might help us reproduce the issue. Then we could also check if there are any other problem in your code that might lead to the condition you are experiencing.

If you are willing to share your conversion script, this might help us reproduce the issue. Then we could also check if there are any other problem in your code that might lead to the condition you are experiencing.
josch commented 2 years ago
Owner

Since there was no further input on this bug for half a year, I'm going to close this. Reopen if anything new is coming up.

Since there was no further input on this bug for half a year, I'm going to close this. Reopen if anything new is coming up.
josch closed this issue 2 years ago
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: josch/img2pdf#139
Loading…
There is no content yet.