Problem converting a very large number of images into a single PDF #3
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
By josch on 2015-03-15T09:41:46.023Z
Created by: sahwar
This is an awesome program, but I'm having problems converting a very large number of images into a single PDF file. The images are about 350 in number, 2 of them are color JPGs (the front cover and the back cover), while all the other images are monochrome (black-and-white) or greyscale PNGs.
The filesize of each image is >=500 KB to >=2 MB.
I think that the problem could lie in the fact that the dimensions of the images are very large (8,600x6,071 px for each of the PNGs and 2,616x3,753 px for the 2 color JPGs), or in the fact that the 2 color JPGs are smaller in terms of dimensions than the PNGs (Maybe
img2pdf
needs all input images to be of the same dimensions? I was left with the impression that this isn't the case).However, I tried to convert ONLY the PNGs to a PDF and the error persisted. I then tried to convert just 1 of the color JPGs and
img2pdf
worked flawlessly... Weird.Here's the error that I get:
Here's a hyperlink to the command that I use for the conversion: the paste in PasteBin.com.
Other things that could be causing this are: maybe the fact that I haven't specified the DPI (
-d
), the-x
and-y
or maybe just the naming of the files is causing a trouble...It would be nice if
img2pdf
'sREADME.md
were to include a mention of the theoretical limit of the number of input images that can be made into a single PDF and the limit of their filesize and image dimensions. Assuming that such limits exist, of course.P.S. I should note that my PC is powerful enough so the problem is probably not related to insufficient hardware capabilities.
Two off-topic suggestions:
It would have been nice if
img2pdf
had the ability to resize (all or some of) the images to common paper sizes (like A4, etc.) for the pages of the output PDF file. The ability to make all (or some) of the pages in portrait or landscape PDF page mode is also good to have. I don't know how difficult it is to implement things like these, but they would definitely be very useful. Maybe ImageMagick could be used for this. Something similar is available in GhostScript.Thanks in advance for your generous help!
Imported comments:
By josch on 2015-03-06 06:49:48 UTC
Hi,
Thanks for your bug report! Unfortunately I am unable to reproduce your error here.
But the problem seems to be that the PIL version you have does not offer the close() attribute. Can you tell me which version of PIL you have installed on your system?
Could you also copypaste your feature request to a separate bug report?
By sahwar on 2015-03-06 13:47:42 UTC
@josch
First of all, thanks for the quick reply!
Secondly, if you mean the Python Imaging Library (Pillow fork) for 'PIL', then the information
for my system is as follows:
Thirdly, tell me if you need to know the version of any other related packages if that is needed
to figure out this problem (though you're probably on the right track with your guess).
My system is Linux Mint 17 KDE (32-bit), I'm actually using a LiveDVD version and
don't have it installed to the HDD of my computer, but that shouldn't be a problem.
Also, this is the vanilla Linux Mint 17 KDE (32-bit) with just a couple of packages
manually added (like the requirements/dependencies of
img2pdf
).P.S. I'll now paste the feature request to a separate bug report, thanks for pointing that out.
By josch on 2015-03-06 14:15:12 UTC
Thanks, if you use Linux Mint 17, then it's easy for me to create a chroot environment of that and do the test myself :)
By josch on 2015-03-07 02:03:55 UTC
The exception should be gone now. @sahwar: Can you confirm?
By sahwar on 2015-03-07 15:37:59 UTC
@josch
Well, the exception really is gone and after waiting 5-10 minutes for the output PDF to be completed (because the input files are huge and are many in terms of their total number), I did get an output PDF and it seems to open fine with Okular.
However, I find it strange that 341 input files (with a total filesize of 176.5 MiB) gave a 303.7 MiB output PDF. Is the extra filesize due to the PDF overhead?
There's something else odd, though it isn't directly related to
img2pdf
and that is the fact that KDE's Dolphin reports the filesize of the output PDF as 0 B while Okular (KDE's PDF viewer) reports the correct filesize. Weird.And there's an additional oddity which is related to the output PDF: the fist input image's page is scaled to be big while the last input image's page is shown normal (even though the last input image's file dimensions are larger than that of the first input image), you can see the difference here (I've edited out the contents of the pages with KolourPaint because they contain private information): http://imgur.com/a/NjhOQ#0. I haven't tested the PDF with a different PDF viewer other than Okular so I don't know if that's a problem with Okular itself or if
img2pdf
scaled the first page incorrectly. I thought thatimg2pdf
puts the input images as pages without scaling them (and does so losslessly) and that this is the default behavior. Correct me if I'm wrong about that 'no scaling by default' thing.By josch on 2015-03-07 17:22:53 UTC
You can get some progress on it by processing your input images in batches or individually and then joining the individual PDF files using pdftk or other tools.
Your output is very much bigger than the input because your input seem to be png images (if the file extension in the command you pasted in one of your earlier messages in an indication of the file type). To lossless put your png into the PDF, they will be unpacked into their raw data and then gzip compressed. This compression is quite a bit less efficient than the png format would be able to compress the same image data. This is why your file size explodes.
I have no idea why Dolphin reports the wrong file size. You should report a bug about that in the dolphin bug tracker.
About the scaling issue, please find some images that produce the same problem and which you are able to share. Then open a bug report with that issue, explaining in detail what input file you used, what the output is and what you expected to happen instead.
I understood that this bugreport was about the Python exception you experienced. Since this is solved, I'll close this bug now.
By sahwar on 2015-03-07 17:42:11 UTC
Yeah, I know, but I don't mind waiting a bit longer to get the output PDF in one go.
You're right, almost all input images that I used in this case are PNG images.
I see, I didn't realize that. It would be great if this is mentioned in the
README.md
as it really matters to know that the PDF output would be way bigger in filesize than the input images' filesize if you use PNGs as input. Maybe you should consider adding this explanation there since I now saw that you did say thatbut it would be nice to explicitly say it in a simpler way for us non-programmers.
I closed Dolphin and reopened it and this bug is now gone so I won't bother reporting it for just this one case since Dolphin does display the filesize correctly after reloading the folder where the output PDF was...
OK, thanks, I'll do that soon.
Yes, this bug is closed. Thanks for the fix that resolved the issue!
P.S. It's a pleasure to work with you, @josch, cheers!
By josch on 2015-03-08 06:41:55 UTC
I say multiple times in the README that everything that is not JPEG will be zip encoded:
or
or
So what information is missing?
I expect people to stumble across img2pdf after they have used any of the million other tools that can convert images to PDF like imagemagick, gimp, photoshop, ghostscript, latex, libre office, microsoft office. And then they noticed that either the resulting pdf files are huge in size or they are small but have lost quality.
You will then see that the README mainly talks about JPEG images as these are the kind that can be embedded into PDF without increase of file size or loss of quality.
If your goal is lossless embedding of images into pdf, then you img2pdf is exactly for you. If you don't, then just use any of the existing converters of image to pdf which will do a lossy conversion of your input images to JPEG and thus save you tons of space.
By sahwar on 2015-03-08 15:49:36 UTC
Yes, that's my goal and I do know that this is what
img2pdf
does. Sorry if I didn't express myself clearly enough.