Unnecessary conversion to RGB in case of a palette-based PNG with a transparency #158

Closed
opened 1 year ago by Ghost · 2 comments
Ghost commented 1 year ago

I have some PNGs which were originally saved as greyscale with alpha channel, which I converted with pngquant to palette-based PNG files where one palette colour is used for transparency. This way the resulting image uses a lot less space.

Now if I use img2pdf 0.4.4 to convert this image to a PDF, img2pdf converts this palette-based image to a full-fledged RGB one, converting an essentially 8 bit image to 24 bits, inflating the size of the image and of the resulting PDF file.

However, I found that this can be easily fixed by adding an additional case for palette images:

--- img2pdf.py.orig	2023-03-17 15:09:29.986165396 +0200
+++ img2pdf.py	2023-03-17 15:10:15.446923355 +0200
@@ -1889,6 +1889,10 @@
                     newcolor = color
                     l, a = newimg.split()
                     newimg = l
+                elif color == Colorspace.P:
+                    newcolor = color
+                    a = newimg.convert(mode="RGBA").split()[-1]
+                    # newimg = newimg
                 else:
                     newcolor = Colorspace.RGBA
                     r, g, b, a = newimg.convert(mode="RGBA").split()

File size comparison:

101K 1bit_background.png
103K 1bit_background.pdf

5,7M greyscale_with_alpha.png
1,8M palette.png
3,5M palette_converted_with_unpatched_img2pdf.pdf
1,8M palette_converted_with_patched_img2pdf.pdf

Both of the images are successfully stacked to a fixed background:

pdftk 1bit_background.pdf stamp palette_converted_with_unpatched_img2pdf.pdf output stacked_with_unpatched_img2pdf.pdf

pdftk 1bit_background.pdf stamp palette_converted_with_patched_img2pdf.pdf output stacked_with_patched_img2pdf.pdf

The space savings are evident:

3,6M stacked_with_unpatched_img2pdf.pdf
1,9M stacked_with_patched_img2pdf.pdf

All of these files are attached to the issue.

I have some PNGs which were originally saved as greyscale with alpha channel, which I converted with `pngquant` to palette-based PNG files where one palette colour is used for transparency. This way the resulting image uses a lot less space. Now if I use `img2pdf 0.4.4` to convert this image to a PDF, `img2pdf` converts this palette-based image to a full-fledged RGB one, converting an essentially 8 bit image to 24 bits, inflating the size of the image and of the resulting PDF file. However, I found that this can be easily fixed by adding an additional case for palette images: ``` --- img2pdf.py.orig 2023-03-17 15:09:29.986165396 +0200 +++ img2pdf.py 2023-03-17 15:10:15.446923355 +0200 @@ -1889,6 +1889,10 @@ newcolor = color l, a = newimg.split() newimg = l + elif color == Colorspace.P: + newcolor = color + a = newimg.convert(mode="RGBA").split()[-1] + # newimg = newimg else: newcolor = Colorspace.RGBA r, g, b, a = newimg.convert(mode="RGBA").split() ``` File size comparison: ``` 101K 1bit_background.png 103K 1bit_background.pdf 5,7M greyscale_with_alpha.png 1,8M palette.png 3,5M palette_converted_with_unpatched_img2pdf.pdf 1,8M palette_converted_with_patched_img2pdf.pdf ``` Both of the images are successfully stacked to a fixed background: ``` pdftk 1bit_background.pdf stamp palette_converted_with_unpatched_img2pdf.pdf output stacked_with_unpatched_img2pdf.pdf pdftk 1bit_background.pdf stamp palette_converted_with_patched_img2pdf.pdf output stacked_with_patched_img2pdf.pdf ``` The space savings are evident: ``` 3,6M stacked_with_unpatched_img2pdf.pdf 1,9M stacked_with_patched_img2pdf.pdf ``` All of these files are attached to the issue.

You might also want to handle PA mode?
Apparently your image is P but has "transparency" in newimg.info? I'm wondering why Pillow does not treat that as PA directly... Related question: would it also work to convert to PA instead of RGBA to extract the alpha channel?

You might also want to handle `PA` mode? Apparently your image is `P` but has `"transparency" in newimg.info`? I'm wondering why Pillow does not treat that as `PA` directly... Related question: would it also work to convert to `PA` instead of `RGBA` to extract the alpha channel?

PS: The "no palette with icc profile" check should be done first. Otherwise, a P image with transparency and ICC profile might get through.

PS: The "no palette with icc profile" check should be done first. Otherwise, a P image with transparency and ICC profile might get through.
josch closed this issue 11 months ago
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: josch/img2pdf#158
Loading…
There is no content yet.