Incorrect page size with different resolutions in x and y directions #181

Closed
opened 6 months ago by j3ff · 9 comments
j3ff commented 6 months ago

Given

convert rose: -density 100x200 test1.png

and

convert rose: -density 100 test2.png

and

img2pdf -o test.pdf test1.png test2.png

identify -verbose correctly shows the print height of test1.png to be half that of test1.png.

Unfortunately, both pages of test.pdf seem to have the same height.

Given ``` convert rose: -density 100x200 test1.png ``` and ``` convert rose: -density 100 test2.png ``` and ``` img2pdf -o test.pdf test1.png test2.png ``` `identify -verbose` correctly shows the print height of `test1.png` to be half that of `test1.png`. Unfortunately, both pages of `test.pdf` seem to have the same height.

I'm not sure what the root cause might be, but I found a workaround: Use the -units flag when creating the images. For example:

convert rose: -density 100x200 -units PixelsPerInch test1.png
convert rose: -density 100 -units PixelsPerInch test2.png
img2pdf -o test.pdf test1.png test2.png

Then the two pages in the PDF are created with different sizes as expected.

Note that I only tested this with ImageMagick 7 on Arch Linux... Hopefully ImageMagick 6 has similar behavior, if you are using that version.

I'm not sure what the root cause might be, but I found a workaround: Use the `-units` flag when creating the images. For example: ``` convert rose: -density 100x200 -units PixelsPerInch test1.png convert rose: -density 100 -units PixelsPerInch test2.png img2pdf -o test.pdf test1.png test2.png ``` Then the two pages in the PDF are created with different sizes as expected. Note that I only tested this with ImageMagick 7 on Arch Linux... Hopefully ImageMagick 6 has similar behavior, if you are using that version.
j3ff commented 6 months ago
Poster

Thanks for the workaround.

I confirm that it also produces the expected result on ImageMagick 6.9.11.

It would be good to have a better fix, however, rather than running problematic images through imagemagick to add the units field.

Thanks for the workaround. I confirm that it also produces the expected result on ImageMagick 6.9.11. It would be good to have a better fix, however, rather than running problematic images through imagemagick to add the units field.
j3ff commented 6 months ago
Poster

This is all part of a python rewrite of gscan2pdf. I'm using img2pdf to create the PDFs before using OCRmyPDF to add the text layer.

I'm manipulating the images internally with PIL, and am trying to use your workaround by adding density_unit or jfif_unit when saving, without success.

Do you have any suggestion on how to do this without using subprocess and imagemagick?

This is all part of a python rewrite of gscan2pdf. I'm using img2pdf to create the PDFs before using OCRmyPDF to add the text layer. I'm manipulating the images internally with PIL, and am trying to use your workaround by adding `density_unit` or `jfif_unit` when saving, without success. Do you have any suggestion on how to do this without using `subprocess` and `imagemagick`?
josch commented 6 months ago
Owner

Do you have any suggestion on how to do this without using subprocess and imagemagick?

I see two ways forward:

  1. wait for me to have enough free time so that i can fix this issue
  2. provide a patch for img2pdf that fixes this

Depending how urgent this is for you, choose either 1 or 2. :)

> Do you have any suggestion on how to do this without using subprocess and imagemagick? I see two ways forward: 1. wait for me to have enough free time so that i can fix this issue 2. provide a patch for img2pdf that fixes this Depending how urgent this is for you, choose either 1 or 2. :)

This is all part of a python rewrite of gscan2pdf. I'm using img2pdf to create the PDFs before using OCRmyPDF to add the text layer.

I'm manipulating the images internally with PIL, and am trying to use your workaround by adding density_unit or jfif_unit when saving, without success.

Do you have any suggestion on how to do this without using subprocess and imagemagick?

It looks like PIL adds an aspect field to the image info dict (for PNG files) when units are not defined. This field appears to be undocumented, and grepping through the code, it appears unique to the PNG format handling. And for save(), the dpi option can be set.

So for the cases of handling PNG files without resolution units defined, I guess you could trying querying aspect and set dpi when saving the image.

> This is all part of a python rewrite of gscan2pdf. I'm using img2pdf to create the PDFs before using OCRmyPDF to add the text layer. > > I'm manipulating the images internally with PIL, and am trying to use your workaround by adding `density_unit` or `jfif_unit` when saving, without success. > > Do you have any suggestion on how to do this without using `subprocess` and `imagemagick`? It looks like PIL [adds an `aspect` field](https://github.com/python-pillow/Pillow/blob/28c173f8d4/src/PIL/PngImagePlugin.py#L520) to the image info dict (for PNG files) when units are not defined. This field appears to be [undocumented](https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#png-opening), and grepping through the code, it appears unique to the PNG format handling. And for `save()`, the `dpi` option can be set. So for the cases of handling PNG files without resolution units defined, I guess you could trying querying `aspect` and set `dpi` when saving the image.
j3ff commented 5 months ago
Poster

So for the cases of handling PNG files without resolution units defined, I guess you could trying querying aspect and set dpi when saving the image.

This was a good tip. Thanks. My code with this workaround now produces the desired result.

> So for the cases of handling PNG files without resolution units defined, I guess you could trying querying `aspect` and set `dpi` when saving the image. This was a good tip. Thanks. My code with this workaround now produces the desired result.
josch closed this issue 5 months ago

@josch Testing with the latest commit with a PNG that does have a DPI set, img2pdf changes the DPI to 96x96. PIL sets the dpi field instead of aspect in that case.

Shouldn't img2pdf keep the DPI settings from the image if present?

Without considering any other side effects this might have, this is the change I tested that preserves the setting:

--- a/src/img2pdf.py
+++ b/src/img2pdf.py
@@ -1325,7 +1325,7 @@ def get_imgmetadata(
                 ndpi = (default_dpi * aspect[0] / aspect[1], default_dpi)
             else:
                 ndpi = (default_dpi, default_dpi * aspect[1] / aspect[0])
-        else:
+        elif ndpi is None:
             ndpi = (default_dpi, default_dpi)
         # In python3, the returned dpi value for some tiff images will
         # not be an integer but a float. To make the behaviour of
@josch Testing with the latest commit with a PNG that *does* have a DPI set, img2pdf changes the DPI to 96x96. PIL sets the `dpi` field instead of `aspect` in that case. Shouldn't img2pdf keep the DPI settings from the image if present? Without considering any other side effects this might have, this is the change I tested that preserves the setting: ```diff --- a/src/img2pdf.py +++ b/src/img2pdf.py @@ -1325,7 +1325,7 @@ def get_imgmetadata( ndpi = (default_dpi * aspect[0] / aspect[1], default_dpi) else: ndpi = (default_dpi, default_dpi * aspect[1] / aspect[0]) - else: + elif ndpi is None: ndpi = (default_dpi, default_dpi) # In python3, the returned dpi value for some tiff images will # not be an integer but a float. To make the behaviour of ```
josch commented 5 months ago
Owner

Thank you for making me aware. It seems I had pushed too early. This is the fixed commit: fb9537d8b7

Thank you for making me aware. It seems I had pushed too early. This is the fixed commit: fb9537d8b716ee9875cb4f7bb1806ff0d6c8a967

Thanks! Confirmed working now :)

Thanks! Confirmed working now :)
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: josch/img2pdf#181
Loading…
There is no content yet.