MPO JPEGs from digital camera with thumbnails inserted as multiple frames #135
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I found that img2pdf does not work properly on JPEGs produced by my digital camera.
As far as Pillow is concerned the files it produces are MPO format (Multi-Picture Object). But they're fully valid JPEGs too if that makes any sense at all. They start with a standard JPEG header, but have both EXIF and TIFF metadata, which in turn contains the additional thumbnails.
When img2pdf is used on this type of JPEG, it creates a PDF containing the original image first, as it should; it then appends all of the thumbnails as separate pages which is not particularly helpful.
Pillow's
im.mpinfo
seems to contain metadata that could be used to determine that the images are thumbnails that can be ignored.The output file can be produced with
img2pdf -o output.pdf input.jpg
.I used img2pdf 0.4.3. (img2pdf 0.4.0 will also convert all of the JPEGs to PNG compression, but this img2pdf 0.4.3 does not have this issue.)
Here are the files.
Thanks for your bug report and the test input! If I understand the problem correctly, then this is a feature and not a bug. img2pdf by default outputs one page for each frame in a multi-frame input image. MPO files are normal JPEG files for parsers that only understand JPEG but they appear as multi-frame images for parsers that understand the MPO format like PIL. Thus, MPO files are treated the same way by img2pdf as multi-frame GIF images, for example.
I think what you want is the
--first-frame-only
option.Does this solve this issue? If yes, please close it. Thanks! :)
I don't think it's reasonable behavior to include both an image and its thumbnails in the final PDF. I can't think of any reason a user would want that behavior.
I understand that in general, a multi-frame image should be unpacked into multiple pages, if the images have some special function like thumbnails, they should be discarded.
I think you are right. I think I want to add another command line option called
--include-thumbnails
. By default, thumbnails will not be included because they are redundant. So with the new behaviour MPO files will be copied into the PDF as they are and thus show up as a single page. The old behaviour can be triggered by supplying--include-thumbnails
. One can argue that the old behaviour was kinda buggy as it chopped up the JPEG into multiple individual JPEGs where the first JPEG still retained the MPO information but was missing the thumbnails.Does anybody know of another image format where thumbnails are represented by Pillow as additional image frames?
The following diff implements the
--include-thumbnails
option and does not include thumbnails by default. Could somebody try out if this does the right thing for them?