Readme clarification regarding PNG re-encoding #44
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
By Bht on 2018-06-14T16:09:15.945Z
Hi,
I'm puzzled with this sentence, found in readme "Bugs" section :
"For lossless conversion of formats other than JPEG or JPEG2000, zip/flate encoding is used."
I assume it should read "other than JPEG, JPEG2000 or PNG, zip/flate encoding is used", because other sentences state that PNGs are embedded without re-encoding.
Could you confirm that, please ?
For the context, my difficulty comes from the lack of information available (on the Internet) regarding the exact support of PNG format features in PDF format. Currently, I understand that neither the transparency (alpha channel) nor the interlacing are supported by the PDF format, but I have a doubt concerning the color type. Some argue that only 8-bit colors (a.k.a. PNG8) are supported; but I guess it's wrong because I do not see any mention of that in the img2pdf readme.
Also, I'd like to thank you warmly for this program. Simple task, simple solution. Very enjoyable and elegant. I'm amazed to find out that it's the single way to do that. Thanks a lot :)
By josch on 2018-06-16T15:03:18.637Z
I see that this paragraph you quote is indeed confusing and not well formulated. I removed it to avoid any further confusion. Thanks for pointing this out.
To explain: zip/flate encoding is also used for PNG images. The difference between flate-encoded bare RGB data and flate-encoded PNG is, that the latter first applies a filter to the RGB data. This filter turns the RGB data into a format that (for the kind of raster graphics that you know PNG is able to compress well) is much easier to compress using flate encoding than the raw RGB data. So yes, the original paragraph is correct in saying that for all other formats other than JPEG or JPEG2000, flate compression is used. The magic in PNG (and the reason it compresses so well using flate compression) is because of the filter that can be applied beforehand (most commonly the Paeth filter).
You are also correct, that PDF neither supports transparency nor interlaced PNGs. img2pdf will refuse to use such PNG images.
I never tried any other input than PNG8. Do you have example images?
I'm happy img2pdf works for you. :)
By josch on 2018-06-16T15:03:19.287Z
Status changed to closed
By Bht on 2018-06-16T23:46:42.660Z
Well, I came for a clarification, I'll go back enlighted ! Thank you :)
These intuitions that Mr. Paeth has had are indeed very savvy.
I'm surprised you never tried anything else than PNG8, because it's limited to 256 colors. PNG24 is probably the most common PNG type, with its 16 million colors limit. Supporting so many colors is actually one of the reason that pushed for the development of the PNG format, according to its Wikipedia page. I rather guess :
If you really mean "8 bits for all channels" and want an example with 24 bits, then it's astonishing but I'll refer you to this image used in the same Wikipedia page.
Anyway, I've not met any problem using img2pdf with PNG24 (for all channels) images. They are apparently not re-encoded (I was fearing it would happen for the case of PNG24) as :
I'm so happy with the result !
By josch on 2018-06-22T12:22:33.785Z
Sorry for the late reply. I was sick during the past days. It also lets me find an easy blame for my confusion between PNG8 and PNG24. Indeed I guess I thought that PNG8 meant 8 bit per channel. :)
If you really want to make sure that img2pdf is lossless, use a tool like pdfimages and compare the output with your input.
But since PNG chunks are copied into the PDF verbatim and without re-encoding, you should not see any differences.
By Bht on 2018-06-23T05:56:26.737Z
Hey Josch,
Actually, I thought of that. But I wasn't sure that pdfimages would reassemble the various chunks exactly the same way they were before being "dismembered" through pdf2img.
If that's not the case, diff would tell me they're different files, even if it's a single harmless/meaningless bit difference.
At the end of the day, what I seeked was - ideally - something fast (seeing imagemagick using hours to re-encode harshly-pngcrushed PNGs was driving me mad), that preserved quality and doesn't take up more space. Well, I found it and these 3 points are perfectly covered !
Hope you got/are getting better :)