Problems converting jpeg with embedded thumbnail #93
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
By m-holger on 2021-04-06T09:54:34.769Z
When converting this 5.7MB jpeg , img2pdf
creates a 20.5MB pdf and produces the following diagnostics
Originally reported at https://github.com/pdfarranger/pdfarranger/issues/457
By josch on 2021-04-06T11:04:01.281Z
Awesome, thank you for this test case! This is indeed a type of image I have never seen before. PIL identifies it as MPO: https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#mpo
img2pdf needs to add support for that kind of images
By josch on 2021-04-12T22:01:25.351Z
Status changed to closed by commit
d29c596fe7
By m-holger on 2021-04-13T08:52:20.936Z
@josch Thanks, that works.
One thought. Would it make sense to only extract the main image? Or at least exclude any preview images?
I assume this will go into 0.4.1 . Any idea when it will be released?
By josch on 2021-04-13T09:23:02.156Z
As far as PIL is concerned, MPO files are more similar to a GIF animation than to a JPEG and in the case of a GIF animation, img2pdf turns all frames into pdf pages with one frame each. This makes sense, because img2pdf tries to be lossless, so it will by default not omit pixel data contained in the image.
Notice, that an MPO image is not the same as a JPEG with a thumbnail in its exif data. For JPEG images with thumbnails img2pdf already does not add an extra page for the thumbnail. Now some camera manufacturers apparently abuse the MPO format for thumbnails. If you don't like that, complain to the camera manufacturer that they should use exif thumbnails as intended.
Now for your specific problem, the
--first-frame-only
option might help. Here is the relevant part from the--help
output:MPO flies are multi-frame images, so with this option you will only see the first frame (aka the main image) in the resulting pdf.
By m-holger on 2021-04-13T10:52:31.778Z
img2pdf is used as a library by PDF Arranger to import images. I don't think having the odd extra preview image imported is a problem, and in this case using the
first-frame-only
option probably would cause more problems than benefits.Could you give me an idea as to when you will release a version of img2pdf incorporating the fix.
As to my query regarding suppressing the preview, I have no opinion as to whether this is desirable as my knowledge of jpegs and exif would easily fit on the back of a stamp. I only raised the point because it seemed to me (perhaps wrongly) that the 1616x1080 image was flagged as a (large) thumbnail in the exif data:
By josch on 2021-04-13T12:15:04.424Z
That is correct. It is flagged as a large thumbnail. The problem is, that we need to solve the general case and not just the case of this particular image. An MPO file can contain any number of images. What if there are multiple "Large Thumbnail (full HD equivalent)"? Do we omit them all? How do we know which thumbnail belongs to which full image? What if the user wants the image marked as thumbnail? We could add tons more options to img2pdf to deal with all the particularities of the MPO format but there are other tools that already do this much better. The unix philosophy is to have tools that each do one thing and to that one thing well. If somebody has an MPO file and some special use-case for it in mind, then there exist much better tools to extract exactly the one single image the user wants from the MPO and then pass the result to img2pdf. If we add options exclusively for MPO specific stuff, then we might as well ask ourselves why we do not do that for other formats either? But img2pdf is not an image manipulation tool but just converts images to pdf. If you want to convert a different image to pdf, then use an image manipulation tool before handing the image to img2pdf.
I plan to make a new release in the next few days.
By m-holger on 2021-04-13T12:34:23.224Z
Thanks
As for the other matter, as I said, I do not have a view on this and I was not trying to change your mind. Sorry if it came across like this.
By josch on 2021-04-13T16:21:11.283Z
Even if you were trying to change my mind, that would be a good thing because maybe others do see things that I do not see and thus I'm interested in hearing the arguments others make. Please don't worry, you never came across as bothersome or impolite to me. Thank you for your bugreport!