Converted PDF cannot open in Acrobat #175

Closed
opened 7 months ago by leoleozhu · 10 comments

I run the following command with the attached jpg file.
python3 src/img2pdf.py -v -o output.pdf 001.jpg

However, when attempting to open the resulting PDF (output.pdf) in Acrobat, an error message stating “Insufficient data for an image” is displayed.

Upon further investigation with exiftools, I found that the image file contains only one component, but the ‘Color Space Data’ is mentioned as RGB. Is it possible that this inconsistency is causing Acrobat to fail in opening the PDF file?

Color Components                : 1
Color Space Data                : RGB

Thanks.

I run the following command with the attached jpg file. `python3 src/img2pdf.py -v -o output.pdf 001.jpg` However, when attempting to open the resulting PDF (output.pdf) in Acrobat, an error message stating “Insufficient data for an image” is displayed. Upon further investigation with exiftools, I found that the image file contains only one component, but the ‘Color Space Data’ is mentioned as RGB. Is it possible that this inconsistency is causing Acrobat to fail in opening the PDF file? ``` Color Components : 1 Color Space Data : RGB ``` Thanks.
Poster

A grayscale image with an RGB color profile seems to be the problem, which appears to be similar to issue #164. This particular image was generated using an older version( 2.2.6) of “SmartAlbums”.

Do you think it safe to exclude the ICC profile on a grayscale JPG file? If it's not that safe, I can also create a patch which verifies the “Software” metadata from the EXIF first before discarding the ICC profile.

A grayscale image with an RGB color profile seems to be the problem, which appears to be similar to issue #164. This particular image was generated using an older version( 2.2.6) of “SmartAlbums”. Do you think it safe to exclude the ICC profile on a grayscale JPG file? If it's not that safe, I can also create a patch which verifies the “Software” metadata from the EXIF first before discarding the ICC profile.
josch commented 7 months ago
Owner

Could you try img2pdf from git? This bug should be fixed there.

I just have to make a new release but the test suite still fails on Fedora...

Could you try img2pdf from git? This bug should be fixed there. I just have to make a new release but the test suite still fails on Fedora...
Poster

Hi Josch,

I tried the head version and since this issue is a little different to #164, it's not fixed.

The following patch could fix it. Could you please review it? As I commented above, pictures with Software properties of 'SmartAlbums' in EXIF are covered. If you think it is also common or it's a safe patch, we could remove that part.

+    # SmartAlbums old version (found 2.2.6) exports JPG with only 1 component 
+    # with an RGB ICC profile which is useless.
+    # This produces an error in Adobe Acrobat, so we ignore it with a warning.
+    if iccp is not None and (
+        (color == Colorspace["L"] and imgformat == ImageFormat.JPEG)
+    ):
+        if hasattr(imgdata, "_getexif") and imgdata._getexif() is not None:
+            for tag, value in imgdata._getexif().items():
+                if TAGS.get(tag, tag) == "Software":
+                    exifsoft = value
+        with io.BytesIO(iccp) as f:
+            prf = ImageCms.ImageCmsProfile(f)
+        if (
+            (prf.profile.model and "sRGB" in prf.profile.model)
+            and (exifsoft and 'SmartAlbums' in exifsoft)
+        ):
+            logger.warning(
+                "Ignoring RGB ICC profile in Grayscale JPG created by SmartAlbums"
+            )
+            iccp = None
+

Regards
Leo

Hi Josch, I tried the head version and since this issue is a little different to #164, it's not fixed. The following patch could fix it. Could you please review it? As I commented above, pictures with Software properties of 'SmartAlbums' in EXIF are covered. If you think it is also common or it's a safe patch, we could remove that part. ``` + # SmartAlbums old version (found 2.2.6) exports JPG with only 1 component + # with an RGB ICC profile which is useless. + # This produces an error in Adobe Acrobat, so we ignore it with a warning. + if iccp is not None and ( + (color == Colorspace["L"] and imgformat == ImageFormat.JPEG) + ): + if hasattr(imgdata, "_getexif") and imgdata._getexif() is not None: + for tag, value in imgdata._getexif().items(): + if TAGS.get(tag, tag) == "Software": + exifsoft = value + with io.BytesIO(iccp) as f: + prf = ImageCms.ImageCmsProfile(f) + if ( + (prf.profile.model and "sRGB" in prf.profile.model) + and (exifsoft and 'SmartAlbums' in exifsoft) + ): + logger.warning( + "Ignoring RGB ICC profile in Grayscale JPG created by SmartAlbums" + ) + iccp = None + ``` Regards Leo
josch commented 7 months ago
Owner

You are completely correct! The other fix was about TIFF and PNG produced by GIMP. Your problem is about JPEG produced by SmartAlbums but the fix is the same. I didn't even know that grayscale JPEGs were possible. I modified your patch a bit:

--- a/src/img2pdf.py
+++ b/src/img2pdf.py
@@ -1478,6 +1478,26 @@ def get_imgmetadata(
                 )
             logger.warning("https://gitlab.gnome.org/GNOME/gimp/-/issues/3438")
             iccp = None
+    # SmartAlbums old version (found 2.2.6) exports JPG with only 1 compone
+    # with an RGB ICC profile which is useless.
+    # This produces an error in Adobe Acrobat, so we ignore it with a warning.
+    if iccp is not None and (
+        (color == Colorspace["L"] and imgformat == ImageFormat.JPEG)
+    ):
+        exifsoft = None
+        if hasattr(imgdata, "_getexif") and imgdata._getexif() is not None:
+            for tag, value in imgdata._getexif().items():
+                if TAGS.get(tag, tag) == "Software":
+                    exifsoft = value
+        with io.BytesIO(iccp) as f:
+            prf = ImageCms.ImageCmsProfile(f)
+        if (prf.profile.model and "sRGB" in prf.profile.model) and (
+            exifsoft and "SmartAlbums" in exifsoft
+        ):
+            logger.warning(
+                "Ignoring RGB ICC profile in Grayscale JPG created by SmartAlbums"
+            )
+            iccp = None
 
     logger.debug("width x height = %dpx x %dpx", imgwidthpx, imgheightpx)
 

What should I put as name and email for the Author field of that commit?

You are completely correct! The other fix was about TIFF and PNG produced by GIMP. Your problem is about JPEG produced by SmartAlbums but the fix is the same. I didn't even know that grayscale JPEGs were possible. I modified your patch a bit: ```diff --- a/src/img2pdf.py +++ b/src/img2pdf.py @@ -1478,6 +1478,26 @@ def get_imgmetadata( ) logger.warning("https://gitlab.gnome.org/GNOME/gimp/-/issues/3438") iccp = None + # SmartAlbums old version (found 2.2.6) exports JPG with only 1 compone + # with an RGB ICC profile which is useless. + # This produces an error in Adobe Acrobat, so we ignore it with a warning. + if iccp is not None and ( + (color == Colorspace["L"] and imgformat == ImageFormat.JPEG) + ): + exifsoft = None + if hasattr(imgdata, "_getexif") and imgdata._getexif() is not None: + for tag, value in imgdata._getexif().items(): + if TAGS.get(tag, tag) == "Software": + exifsoft = value + with io.BytesIO(iccp) as f: + prf = ImageCms.ImageCmsProfile(f) + if (prf.profile.model and "sRGB" in prf.profile.model) and ( + exifsoft and "SmartAlbums" in exifsoft + ): + logger.warning( + "Ignoring RGB ICC profile in Grayscale JPG created by SmartAlbums" + ) + iccp = None logger.debug("width x height = %dpx x %dpx", imgwidthpx, imgheightpx) ``` What should I put as name and email for the `Author` field of that commit?
Poster

Thank you for your confirmation. Leo leozhu8964@gmail.com is fine. Thanks.

Thank you for your confirmation. Leo <leozhu8964@gmail.com> is fine. Thanks.
josch closed this issue 7 months ago
Poster

Found another case. Images were created with PhotoShop.

Found another case. Images were created with PhotoShop.
josch commented 7 months ago
Owner

uff... why do all these image manipulation programs put an RGB profile into a grayscale image? :(

Are these also JPEGs? What is the value of the Software tag?

uff... why do all these image manipulation programs put an RGB profile into a grayscale image? :( Are these also JPEGs? What is the value of the Software tag?
Poster

History Software Agent : Adobe Photoshop CS6 (Windows), Adobe Photoshop 23.0 (Windows)
Software : Adobe Photoshop 23.0 (Windows)

History Software Agent : Adobe Photoshop CS6 (Windows), Adobe Photoshop 23.0 (Windows) Software : Adobe Photoshop 23.0 (Windows)
Poster
Seems the attachments are not displaying. https://www.dropbox.com/sh/5xq6kllf1ormjkm/AACx6kw1vXr2-rOonngxEYPia?dl=0
josch commented 7 months ago
Owner

i merged your MR, thank you!

i merged your MR, thank you!
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: josch/img2pdf#175
Loading…
There is no content yet.