Image DPI with resolution value both in EXIF (IFD0) and JFIF #183

New issue

Open

opened 2023-11-28 05:38:34 +00:00 by leoleozhu · 2 comments

leoleozhu commented

2023-11-28 05:38:34 +00:00

Contributor

There is an image (attached).

With exiftools, it prints.

➜  exiftool -G1 test.jpg| grep Resolution
[IFD0]          Resolution Unit                 : inches
[IFD0]          X Resolution                    : 254
[IFD0]          Y Resolution                    : 254
➜  exiftool -G1 -a test.jpg| grep Resolution
[JFIF]          Resolution Unit                 : inches
[JFIF]          X Resolution                    : 72
[JFIF]          Y Resolution                    : 72
[IFD0]          Resolution Unit                 : inches
[IFD0]          X Resolution                    : 254
[IFD0]          Y Resolution                    : 254

So, in fact, the image contains duplicate resolution information from both EXIF (IFD0) and JFIF.

When converting the jpg with img2pdf, it uses 72 as the DPI value.

However, when opening the image with "Preview" on MacOS or "Photos" on Windows, it displays 254 as the resolution.

Currently, img2pdf uses pillow's image info to get the DPI info, with the following code:
ndpi = imgdata.info.get("dpi")

Do you think it would be sensible to modify it to prioritize loading from EXIF first, and then fallback to imgdata.info.get("dpi") if no resolution is found in EXIF data?

Thanks and kind regards

Leo

There is an image (attached). With exiftools, it prints. ``` ➜ exiftool -G1 test.jpg| grep Resolution [IFD0] Resolution Unit : inches [IFD0] X Resolution : 254 [IFD0] Y Resolution : 254 ➜ exiftool -G1 -a test.jpg| grep Resolution [JFIF] Resolution Unit : inches [JFIF] X Resolution : 72 [JFIF] Y Resolution : 72 [IFD0] Resolution Unit : inches [IFD0] X Resolution : 254 [IFD0] Y Resolution : 254 ``` So, in fact, the image contains duplicate resolution information from both EXIF (IFD0) and JFIF. When converting the jpg with img2pdf, it uses 72 as the DPI value. However, when opening the image with "Preview" on MacOS or "Photos" on Windows, it displays 254 as the resolution. Currently, img2pdf uses pillow's image info to get the DPI info, with the following code: `ndpi = imgdata.info.get("dpi")` Do you think it would be sensible to modify it to prioritize loading from EXIF first, and then fallback to `imgdata.info.get("dpi")` if no resolution is found in EXIF data? Thanks and kind regards Leo

test.jpg

536 KiB

leoleozhu changed title from ~~Image DPI with both resolution value in EXIF (IFD0) and JFIF~~ to Image DPI with resolution value both in EXIF (IFD0) and JFIF

2023-11-28 05:44:40 +00:00

josch commented

2023-11-28 07:20:30 +00:00

Owner

Thank you for your bug report!

Though I am not convinced of this argument.

Your situation is, that your JFIF data is broken and you want to prioritize EXIF. Lets say we change that. Now what happens if somebody comes along who says that their EXIF data is broken and clearly img2pdf should choose JFIF instead? In the Pillow code it says:

    # If DPI isn't in JPEG header, fetch from EXIF
    if "dpi" not in self.info and "exif" in self.info:

So Pillow is quite clear that JFIF dpi trumps EXIF dpi. Can you make a good global argument, that for images that contain incorrect information, img2pdf should do something different from Pillow?

Because I could also argue: if your input data is broken, fix your input instead of trying to shoehorn broken data into img2pdf and let img2pdf fix it instead of you fixing your data.

Thank you for your bug report! Though I am not convinced of this argument. Your situation is, that your JFIF data is broken and you want to prioritize EXIF. Lets say we change that. Now what happens if somebody comes along who says that their EXIF data is broken and clearly img2pdf should choose JFIF instead? In the Pillow code it says: ```python # If DPI isn't in JPEG header, fetch from EXIF if "dpi" not in self.info and "exif" in self.info: ``` So Pillow is quite clear that JFIF dpi trumps EXIF dpi. Can you make a good global argument, that for images that contain incorrect information, img2pdf should do something different from Pillow? Because I could also argue: if your input data is broken, fix your input instead of trying to shoehorn broken data into img2pdf and let img2pdf fix it instead of you fixing your data.

leoleozhu commented

2023-11-28 09:19:16 +00:00

Author

Contributor

Thank you for your reply.

I understand your concern regarding the prioritization of EXIF data over JFIF data in img2pdf. You brought up a valid point about potential scenarios where EXIF data might be broken, and in such cases, it could be argued that img2pdf should choose JFIF instead.

Regarding your suggestion of fixing broken input data instead of trying to accommodate it within img2pdf, I agree that it is essential to address and rectify any issues with the original data whenever possible.

Which stands with me is that, when opening this image with "Preview" on MacOS and "Photos" on Windows, both programs choose DPI from EXIF.

Taking this into consideration and considering img2pdf as a versatile tool to convert various types of input(including cases where the data might be imperfect or inconsistent), I still believe it's worth discussion which to choose for the DPI information.

I hope this provides some insight into my viewpoint on the matter. I would be interested to hear your thoughts on the potential consideration.

Thanks and kind regards

Thank you for your bug report!

Though I am not convinced of this argument.

Your situation is, that your JFIF data is broken and you want to prioritize EXIF. Lets say we change that. Now what happens if somebody comes along who says that their EXIF data is broken and clearly img2pdf should choose JFIF instead? In the Pillow code it says:
    # If DPI isn't in JPEG header, fetch from EXIF
    if "dpi" not in self.info and "exif" in self.info:
So Pillow is quite clear that JFIF dpi trumps EXIF dpi. Can you make a good global argument, that for images that contain incorrect information, img2pdf should do something different from Pillow?

Because I could also argue: if your input data is broken, fix your input instead of trying to shoehorn broken data into img2pdf and let img2pdf fix it instead of you fixing your data.

Thank you for your reply. I understand your concern regarding the prioritization of EXIF data over JFIF data in img2pdf. You brought up a valid point about potential scenarios where EXIF data might be broken, and in such cases, it could be argued that img2pdf should choose JFIF instead. Regarding your suggestion of fixing broken input data instead of trying to accommodate it within img2pdf, I agree that it is essential to address and rectify any issues with the original data whenever possible. Which stands with me is that, when opening this image with "Preview" on MacOS and "Photos" on Windows, both programs choose DPI from EXIF. Taking this into consideration and considering img2pdf as a versatile tool to convert various types of input(including cases where the data might be imperfect or inconsistent), I still believe it's worth discussion which to choose for the DPI information. I hope this provides some insight into my viewpoint on the matter. I would be interested to hear your thoughts on the potential consideration. Thanks and kind regards > Thank you for your bug report! > > Though I am not convinced of this argument. > > Your situation is, that your JFIF data is broken and you want to prioritize EXIF. Lets say we change that. Now what happens if somebody comes along who says that their EXIF data is broken and clearly img2pdf should choose JFIF instead? In the Pillow code it says: > > ```python > # If DPI isn't in JPEG header, fetch from EXIF > if "dpi" not in self.info and "exif" in self.info: > ``` > > So Pillow is quite clear that JFIF dpi trumps EXIF dpi. Can you make a good global argument, that for images that contain incorrect information, img2pdf should do something different from Pillow? > > Because I could also argue: if your input data is broken, fix your input instead of trying to shoehorn broken data into img2pdf and let img2pdf fix it instead of you fixing your data.