Reads tiffs? #88

Closed
opened 3 years ago by josch · 0 comments
josch commented 3 years ago
Owner

By John on 2020-12-18T19:21:16.345Z

I notice that (fairly recently) support for group4-encoded tiffs was added, which is great. However other tiffs seem to be converted to png. Most of the time, that doesn't matter to me, but sometimes it does. Any chance support for other tiffs can be added?

Thanks.


By josch on 2020-12-18T21:55:34.016Z


I don't understand. What is the bug in img2pdf?


By John on 2020-12-18T22:24:15.079Z


img2pdf image.tiff -o new.pdf

results in a pdf of a different file size than the original tiff, unless the tiff is Group4 compressed (maybe under other conditions as well). If I use, say, pdfimages to get the image back out of the pdf, it's not a tiff, but a png. JPG remains the same.

Could be that I'm not entirely understanding what's going on, but why does the format change on the tiff?


By josch on 2020-12-18T22:37:41.079Z


The format changes, because PDF does not understand the tiff format.

What both PDF and TIFF understand is group4 encoding, so if your tiff is compressed with ccitt group4, then the image in the pdf will use the same. But all other ways of storing raster image data in tiff are not supported by pdf. Thus, the pixel data has to be moved to a different format. The most space saving lossless format is paeth encoding (used in png).

Where is the bug?


By John on 2020-12-18T22:42:45.918Z


Thanks, that's helpful. I didn't understand about the problems pdf has with tiff.


By John on 2020-12-18T22:42:47.468Z


Status changed to closed


By josch on 2020-12-18T22:51:34.056Z


I wouldn't call it a "problem". I mean jpeg also does support ccitt group4 or the paeth filter and we also don't call it a problem. It's just one of the properties of the pdf format. The only supported image formats of pdf are jpeg, plain uncompressed pixel data, paeth filter and ccitt group4. What I would rather call a "problem" when it comes to what pdf supports, then that it cannot store anything with an alpha channel. ;)


By John on 2020-12-19T00:15:49.740Z


Well, jpeg is a file format and pdf is...uh...something else.

I'd also say anytime it throws away data (like it always does with dpi) is a problem.

If there's a readable document that explains what happens with images in PDF, I'd be interested. For instance, I've been using imagemagick to convert images to pdf and figured I'd switch to img2pdf which seems noticeably faster. img2pdf seems pretty much to produce pdfs with the same file size (format permitting), but IM will sometimes be smaller (as with some pngs) or larger (as with tiff where it seems not to use any compression if it's not ccitt). Some png info clearly can get lost on the way back out of the pdf using pdfimages.


By josch on 2020-12-19T08:47:13.844Z


PDF is also a file format but it's more of a container format. JPEG and PNG only contain one kind of data. PDF and TIFF are both able to store data of many different kinds, similar to how MP4 or MKV are video containers able to store their data in many different ways (codecs).

By a document explaining what happens with images in pdf, do you mean a document how pdf works? That's here: https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
Based on that, programs producing PDF have the choice of how they encode their images. What img2pdf does, is to choose an encoding of the pixel data that allows it to be lossless while at the same time resulting in a small output file. In case of JPEG, the whole JPEG file is just dumped into the PDF. For all other raster images, the pixel data is the only thing retained, so for all other formats than JPEG all metadata will be lost.

The reason img2pdf is faster is, that in most cases it is able to just copy the image data into the pdf without re-encoding it. Look at the README for a table that explains which input allows direct inclusion and in which cases some computations have to happen.

IM will be smaller, yes. But it will also change the pixel data by re-encoding.


By John on 2020-12-20T00:24:27.987Z


IM will be smaller, yes. But it will also change the pixel data by re-encoding.

I'm not sure there's recoding that modifies the data in the case, or ever. If I magick a png to pdf and then extract the image with pdfimages, I get the same thing back, even though the pdf file size is smaller than the original png. Asking over at the IM boards.


By josch on 2020-12-20T08:59:29.392Z


Oooh? That would be new! Can you share the png image and the imagemagick command you used?


By John on 2020-12-20T20:39:32.177Z


I attach an example png. If I convert it to pdf via:

magick orig.png demo.pdf

the 458k png becomes a 379k PDF. Extracting the png gives me back a 452k file:

pdfimages -all demo.pdf demo

The two png files seem to be pixel-identical.
orig

It's also the case that if I use pngcrush on the file and then use IM to embed it, it has exactly the same file size as the pdf IM makes from the original png.


By josch on 2020-12-20T22:14:16.110Z


Thank you! The mystery is solved. :)

You managed to find a PNG (first time I see it) where the paeth filter makes the zlib compression worse. The better option is actually to not apply the paeth filter at all and then compress the pixel data with zlib. This is what pdf does and that's why it's smaller than your original png. You can see this in the pdf from this:

<<
/Type /XObject
/Subtype /Image
/Name /Im0
/Filter [ /FlateDecode ]
/Width 650
/Height 827
/ColorSpace 10 0 R
/BitsPerComponent 8
/Length 9 0 R
>>

The /FlateDecode filter means, that the raw pixel data is compressed with zlib. Your input png image on the other hand utilizes all kind of filters more or less randomly -- no idea which software encoded it. You can see this from this output of pngcheck:

File: orig.png (458405 bytes)     
  chunk IHDR at offset 0x0000c, length 13
    650 x 827 image, 24-bit RGB, non-interlaced
  chunk gAMA at offset 0x00025, length 4: 0.45455
  chunk cHRM at offset 0x00035, length 32
    White x = 0.3127 y = 0.329,  Red x = 0.64 y = 0.33
    Green x = 0.3 y = 0.6,  Blue x = 0.15 y = 0.06
  chunk bKGD at offset 0x00061, length 6
    red = 0x00ff, green = 0x00ff, blue = 0x00ff
  chunk pHYs at offset 0x00073, length 9: 7874x7874 pixels/meter (200 dpi)
  chunk IDAT at offset 0x00088, length 32768
    zlib: deflated, 32K window, maximum compression
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      1 4 4 4 4 4 4 4 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
      2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
      2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
      2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
      2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
      2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
      2 2 2 1 1 1 1 1 3 4 2 2 2 2 2 2 2 3 1 4 4 4 4 4 4
      4 1 2 2 2 2 4 4 4 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
      2 1 4 1 4 4 4 4 2 2 4 4 2 2 4 2 3 1 3 1 3 2 4 3 2
      2 (226 out of 827)
  chunk IDAT at offset 0x08094, length 32768
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      4 2 3 1 1 3 1 2 2 4 2 2 2 2 4 3 1 3 1 3 2 4 4 2 2
      4 2 3 1 3 3 3 2 4 4 2 2 4 2 (265 out of 827)
  chunk IDAT at offset 0x100a0, length 32768
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      3 1 1 3 3 2 2 2 2 2 2 4 2 3 1 3 3 2 2 4 3 2 2 2 2
      3 1 3 1 3 2 4 4 2 4 4 (301 out of 827)
  chunk IDAT at offset 0x180ac, length 32768
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      2 3 1 3 3 3 2 2 4 3 2 2 2 3 3 1 3 1 2 2 2 3 2 4 4
      2 3 1 1 4 3 3 4 4 2 4 4 4 3 1 4 1 1 3 4 3 3 4 4 2
      (351 out of 827)
  chunk IDAT at offset 0x200b8, length 32768
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      3 1 1 3 4 3 4 4 3 4 4 4 4 1 1 3 1 3 4 4 4 2 4 4 4
      3 1 4 1 4 3 4 4 4 4 4 4 (388 out of 827)
  chunk IDAT at offset 0x280c4, length 32768
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      3 1 1 1 3 3 4 4 2 4 4 4 3 1 1 4 1 4 4 4 3 4 4 4 4
      1 1 3 3 3 3 4 3 3 4 4 4 (425 out of 827)
  chunk IDAT at offset 0x300d0, length 32768
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      3 1 4 3 4 3 3 4 3 4 4 4 3 1 1 3 3 3 4 4 3 2 4 4 2
      3 1 3 4 3 2 4 4 2 4 4 2 (462 out of 827)
  chunk IDAT at offset 0x380dc, length 32768
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      3 1 3 3 4 4 3 4 2 4 4 4 3 3 1 3 4 2 2 4 3 4 4 2 2
      3 1 3 4 3 4 4 4 2 4 4 4 3 1 (501 out of 827)
  chunk IDAT at offset 0x400e8, length 32768
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      1 1 3 2 3 3 3 2 2 4 3 4 4 3 1 3 3 3 2 4 2 2 4 1 1
      1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 2 2 2 2 2 2
      2 4 1 2 4 4 4 4 4 3 3 2 2 4 2 4 4 3 (569 out of 827)
  chunk IDAT at offset 0x480f4, length 32768
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      1 3 4 2 2 4 2 1 4 3 1 2 4 2 3 2 1 3 3 3 2 2 4 2 4
      2 3 1 3 3 2 2 4 2 0 2 3 1 3 3 4 2 4 (612 out of 827)
  chunk IDAT at offset 0x50100, length 32768
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      2 4 2 3 1 3 3 2 2 2 2 4 2 3 1 4 3 2 4 4 2 4 4 3 1
      3 3 2 2 4 3 4 3 3 4 3 3 2 4 3 3 4 (654 out of 827)
  chunk IDAT at offset 0x5810c, length 32768
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      3 1 1 2 2 2 3 2 4 4 3 1 3 3 4 2 4 3 3 4 3 1 2 4 2
      2 2 3 4 2 3 1 2 2 2 2 4 3 4 (693 out of 827)
  chunk IDAT at offset 0x60118, length 32768
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      2 3 1 3 2 2 2 2 3 4 2 3 4 2 3 2 4 2 3 1 3 3 2 4 2
      2 2 3 1 1 3 3 2 4 2 2 2 3 1 3 3 (734 out of 827)
  chunk IDAT at offset 0x68124, length 32011
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      3 2 4 2 2 2 3 1 3 3 3 3 2 2 4 2 3 1 3 3 3 4 2 2 2
      3 1 1 3 3 2 3 2 2 2 3 1 3 3 1 3 2 1 2 2 2 2 2 2 2
      2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
      2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 (827 out of 827)
  chunk tEXt at offset 0x6fe3b, length 37, keyword: date:create
  chunk tEXt at offset 0x6fe6c, length 37, keyword: date:modify
  chunk IEND at offset 0x6fe9d, length 0
No errors detected in orig.png (22 chunks, 71.6% compression).

As you can see, filters 1, 2, 3 and 4 get used but never filter 0, which would be no filter. For this input data, the best compression is to use no filter at all. This is also what pngcrush realizes, so after running your image through pngcrush you get:

File: pngout.png (371230 bytes)
  chunk IHDR at offset 0x0000c, length 13
    650 x 827 image, 24-bit RGB, non-interlaced
  chunk gAMA at offset 0x00025, length 4: 0.45455
  chunk cHRM at offset 0x00035, length 32
    White x = 0.3127 y = 0.329,  Red x = 0.64 y = 0.33
    Green x = 0.3 y = 0.6,  Blue x = 0.15 y = 0.06
  chunk bKGD at offset 0x00061, length 6
    red = 0x00ff, green = 0x00ff, blue = 0x00ff
  chunk pHYs at offset 0x00073, length 9: 7874x7874 pixels/meter (200 dpi)
  chunk IDAT at offset 0x00088, length 370976
    zlib: deflated, 32K window, maximum compression
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 (827 out of 827)
  chunk tEXt at offset 0x5a9b4, length 37, keyword: date:create
  chunk tEXt at offset 0x5a9e5, length 37, keyword: date:modify
  chunk IEND at offset 0x5aa16, length 0
No errors detected in pngout.png (9 chunks, 77.0% compression).

All those zeros mean, that no filter was applied. The resulting png image is then also smaller than the pdf created by imagemagick. Funnily, when using pdfimages -all, the newly created png image also seems to pick all kinds of filters in the hopes that this improves compression -- it doesn't. :)

Thanks for this input! This is the first time I'm seeing this. :)


By John on 2020-12-21T20:09:22.861Z


To get the png, I ran the attached jpg through a basic IM command:

magick orig.jpg orig.png

Running through all the IM combinations for compression filter, level, and strategy, I can't get down to the pngcrush size. The best I can do is by setting those values, respectively, to 130, 131, 230, 231, 330, 331, 430, and 431. All of which give identically sized files: 357630. The pngcrush output was 343905.

If I understand it correctly, pngcheck shows multiple data chunks with no compression, so presumably it's the chunking that's using up the extra 14k?

orig


By josch on 2020-12-21T21:05:12.670Z


You are probably talking about the options png:compression-filter, png:compression-level and png:compression-strategy? Those only control the zip compression. How png works and how it achieves such good compression compared to bmp or gif is the paeth filter. That filter is not a compression. It just filters the data in a way that makes it really well suited for zip compression. In most cases, that is. You found one example, where the paeth filter makes it worse. So you would have to somehow tell imagemagick not to use the paeth filter but hand the data to the zip compressor without filtering. I don't know if there is an option for that.


By John on 2020-12-21T21:31:52.523Z


I'm pretty sure that I covered paeth in there, and I think it's a filter value of "4". The other png compression method is using the -quality option. I tried all of those options as well. See details here: https://legacy.imagemagick.org/Usage/formats/#png_quality

I'm hoping to find out on the IM boards whether IM uses pngcrush before putting a png into a PDF.

*By John on 2020-12-18T19:21:16.345Z* I notice that (fairly recently) support for group4-encoded tiffs was added, which is great. However other tiffs seem to be converted to png. Most of the time, that doesn't matter to me, but sometimes it does. Any chance support for other tiffs can be added? Thanks. --- *By josch on 2020-12-18T21:55:34.016Z* --- I don't understand. What is the bug in img2pdf? --- *By John on 2020-12-18T22:24:15.079Z* --- `img2pdf image.tiff -o new.pdf` results in a pdf of a different file size than the original tiff, unless the tiff is Group4 compressed (maybe under other conditions as well). If I use, say, pdfimages to get the image back out of the pdf, it's not a tiff, but a png. JPG remains the same. Could be that I'm not entirely understanding what's going on, but why does the format change on the tiff? --- *By josch on 2020-12-18T22:37:41.079Z* --- The format changes, because PDF does not understand the tiff format. What both PDF and TIFF understand is group4 encoding, so if your tiff is compressed with ccitt group4, then the image in the pdf will use the same. But all other ways of storing raster image data in tiff are not supported by pdf. Thus, the pixel data has to be moved to a different format. The most space saving lossless format is paeth encoding (used in png). Where is the bug? --- *By John on 2020-12-18T22:42:45.918Z* --- Thanks, that's helpful. I didn't understand about the problems pdf has with tiff. --- *By John on 2020-12-18T22:42:47.468Z* --- Status changed to closed --- *By josch on 2020-12-18T22:51:34.056Z* --- I wouldn't call it a "problem". I mean jpeg also does support ccitt group4 or the paeth filter and we also don't call it a problem. It's just one of the properties of the pdf format. The only supported image formats of pdf are jpeg, plain uncompressed pixel data, paeth filter and ccitt group4. What I would rather call a "problem" when it comes to what pdf supports, then that it cannot store anything with an alpha channel. ;) --- *By John on 2020-12-19T00:15:49.740Z* --- Well, jpeg is a file format and pdf is...uh...something else. I'd also say anytime it throws away data (like it always does with dpi) is a problem. If there's a readable document that explains what happens with images in PDF, I'd be interested. For instance, I've been using imagemagick to convert images to pdf and figured I'd switch to img2pdf which seems noticeably faster. img2pdf seems pretty much to produce pdfs with the same file size (format permitting), but IM will sometimes be smaller (as with some pngs) or larger (as with tiff where it seems not to use any compression if it's not ccitt). Some png info clearly can get lost on the way back out of the pdf using pdfimages. --- *By josch on 2020-12-19T08:47:13.844Z* --- PDF is also a file format but it's more of a container format. JPEG and PNG only contain one kind of data. PDF and TIFF are both able to store data of many different kinds, similar to how MP4 or MKV are video containers able to store their data in many different ways (codecs). By a document explaining what happens with images in pdf, do you mean a document how pdf works? That's here: https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf Based on that, programs producing PDF have the choice of how they encode their images. What img2pdf does, is to choose an encoding of the pixel data that allows it to be lossless while at the same time resulting in a small output file. In case of JPEG, the whole JPEG file is just dumped into the PDF. For all other raster images, the pixel data is the only thing retained, so for all other formats than JPEG all metadata will be lost. The reason img2pdf is faster is, that in most cases it is able to just copy the image data into the pdf without re-encoding it. Look at the README for a table that explains which input allows direct inclusion and in which cases some computations have to happen. IM will be smaller, yes. But it will also change the pixel data by re-encoding. --- *By John on 2020-12-20T00:24:27.987Z* --- > IM will be smaller, yes. But it will also change the pixel data by re-encoding. I'm not sure there's recoding that modifies the data in the case, or ever. If I magick a png to pdf and then extract the image with pdfimages, I get the same thing back, even though the pdf file size is smaller than the original png. Asking over at the IM boards. --- *By josch on 2020-12-20T08:59:29.392Z* --- Oooh? That would be new! Can you share the png image and the imagemagick command you used? --- *By John on 2020-12-20T20:39:32.177Z* --- I attach an example png. If I convert it to pdf via: `magick orig.png demo.pdf` the 458k png becomes a 379k PDF. Extracting the png gives me back a 452k file: `pdfimages -all demo.pdf demo` The two png files seem to be pixel-identical. ![orig](/uploads/c74b62ffaf5bbc3fbddd88631aa011f2/orig.png) It's also the case that if I use pngcrush on the file and then use IM to embed it, it has exactly the same file size as the pdf IM makes from the original png. --- *By josch on 2020-12-20T22:14:16.110Z* --- Thank you! The mystery is solved. :) You managed to find a PNG (first time I see it) where the paeth filter makes the zlib compression worse. The better option is actually to not apply the paeth filter at all and then compress the pixel data with zlib. This is what pdf does and that's why it's smaller than your original png. You can see this in the pdf from this: ``` << /Type /XObject /Subtype /Image /Name /Im0 /Filter [ /FlateDecode ] /Width 650 /Height 827 /ColorSpace 10 0 R /BitsPerComponent 8 /Length 9 0 R >> ``` The `/FlateDecode` filter means, that the raw pixel data is compressed with zlib. Your input png image on the other hand utilizes all kind of filters more or less randomly -- no idea which software encoded it. You can see this from this output of pngcheck: ``` File: orig.png (458405 bytes) chunk IHDR at offset 0x0000c, length 13 650 x 827 image, 24-bit RGB, non-interlaced chunk gAMA at offset 0x00025, length 4: 0.45455 chunk cHRM at offset 0x00035, length 32 White x = 0.3127 y = 0.329, Red x = 0.64 y = 0.33 Green x = 0.3 y = 0.6, Blue x = 0.15 y = 0.06 chunk bKGD at offset 0x00061, length 6 red = 0x00ff, green = 0x00ff, blue = 0x00ff chunk pHYs at offset 0x00073, length 9: 7874x7874 pixels/meter (200 dpi) chunk IDAT at offset 0x00088, length 32768 zlib: deflated, 32K window, maximum compression row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 1 4 4 4 4 4 4 4 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 3 4 2 2 2 2 2 2 2 3 1 4 4 4 4 4 4 4 1 2 2 2 2 4 4 4 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 4 1 4 4 4 4 2 2 4 4 2 2 4 2 3 1 3 1 3 2 4 3 2 2 (226 out of 827) chunk IDAT at offset 0x08094, length 32768 row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 4 2 3 1 1 3 1 2 2 4 2 2 2 2 4 3 1 3 1 3 2 4 4 2 2 4 2 3 1 3 3 3 2 4 4 2 2 4 2 (265 out of 827) chunk IDAT at offset 0x100a0, length 32768 row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 3 1 1 3 3 2 2 2 2 2 2 4 2 3 1 3 3 2 2 4 3 2 2 2 2 3 1 3 1 3 2 4 4 2 4 4 (301 out of 827) chunk IDAT at offset 0x180ac, length 32768 row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 2 3 1 3 3 3 2 2 4 3 2 2 2 3 3 1 3 1 2 2 2 3 2 4 4 2 3 1 1 4 3 3 4 4 2 4 4 4 3 1 4 1 1 3 4 3 3 4 4 2 (351 out of 827) chunk IDAT at offset 0x200b8, length 32768 row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 3 1 1 3 4 3 4 4 3 4 4 4 4 1 1 3 1 3 4 4 4 2 4 4 4 3 1 4 1 4 3 4 4 4 4 4 4 (388 out of 827) chunk IDAT at offset 0x280c4, length 32768 row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 3 1 1 1 3 3 4 4 2 4 4 4 3 1 1 4 1 4 4 4 3 4 4 4 4 1 1 3 3 3 3 4 3 3 4 4 4 (425 out of 827) chunk IDAT at offset 0x300d0, length 32768 row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 3 1 4 3 4 3 3 4 3 4 4 4 3 1 1 3 3 3 4 4 3 2 4 4 2 3 1 3 4 3 2 4 4 2 4 4 2 (462 out of 827) chunk IDAT at offset 0x380dc, length 32768 row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 3 1 3 3 4 4 3 4 2 4 4 4 3 3 1 3 4 2 2 4 3 4 4 2 2 3 1 3 4 3 4 4 4 2 4 4 4 3 1 (501 out of 827) chunk IDAT at offset 0x400e8, length 32768 row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 1 1 3 2 3 3 3 2 2 4 3 4 4 3 1 3 3 3 2 4 2 2 4 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 2 2 2 2 2 2 2 4 1 2 4 4 4 4 4 3 3 2 2 4 2 4 4 3 (569 out of 827) chunk IDAT at offset 0x480f4, length 32768 row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 1 3 4 2 2 4 2 1 4 3 1 2 4 2 3 2 1 3 3 3 2 2 4 2 4 2 3 1 3 3 2 2 4 2 0 2 3 1 3 3 4 2 4 (612 out of 827) chunk IDAT at offset 0x50100, length 32768 row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 2 4 2 3 1 3 3 2 2 2 2 4 2 3 1 4 3 2 4 4 2 4 4 3 1 3 3 2 2 4 3 4 3 3 4 3 3 2 4 3 3 4 (654 out of 827) chunk IDAT at offset 0x5810c, length 32768 row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 3 1 1 2 2 2 3 2 4 4 3 1 3 3 4 2 4 3 3 4 3 1 2 4 2 2 2 3 4 2 3 1 2 2 2 2 4 3 4 (693 out of 827) chunk IDAT at offset 0x60118, length 32768 row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 2 3 1 3 2 2 2 2 3 4 2 3 4 2 3 2 4 2 3 1 3 3 2 4 2 2 2 3 1 1 3 3 2 4 2 2 2 3 1 3 3 (734 out of 827) chunk IDAT at offset 0x68124, length 32011 row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 3 2 4 2 2 2 3 1 3 3 3 3 2 2 4 2 3 1 3 3 3 4 2 2 2 3 1 1 3 3 2 3 2 2 2 3 1 3 3 1 3 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 (827 out of 827) chunk tEXt at offset 0x6fe3b, length 37, keyword: date:create chunk tEXt at offset 0x6fe6c, length 37, keyword: date:modify chunk IEND at offset 0x6fe9d, length 0 No errors detected in orig.png (22 chunks, 71.6% compression). ``` As you can see, filters 1, 2, 3 and 4 get used but never filter 0, which would be no filter. For this input data, the best compression is to use no filter at all. This is also what `pngcrush` realizes, so after running your image through `pngcrush` you get: ``` File: pngout.png (371230 bytes) chunk IHDR at offset 0x0000c, length 13 650 x 827 image, 24-bit RGB, non-interlaced chunk gAMA at offset 0x00025, length 4: 0.45455 chunk cHRM at offset 0x00035, length 32 White x = 0.3127 y = 0.329, Red x = 0.64 y = 0.33 Green x = 0.3 y = 0.6, Blue x = 0.15 y = 0.06 chunk bKGD at offset 0x00061, length 6 red = 0x00ff, green = 0x00ff, blue = 0x00ff chunk pHYs at offset 0x00073, length 9: 7874x7874 pixels/meter (200 dpi) chunk IDAT at offset 0x00088, length 370976 zlib: deflated, 32K window, maximum compression row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth): 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (827 out of 827) chunk tEXt at offset 0x5a9b4, length 37, keyword: date:create chunk tEXt at offset 0x5a9e5, length 37, keyword: date:modify chunk IEND at offset 0x5aa16, length 0 No errors detected in pngout.png (9 chunks, 77.0% compression). ``` All those zeros mean, that no filter was applied. The resulting png image is then also smaller than the pdf created by imagemagick. Funnily, when using `pdfimages -all`, the newly created png image also seems to pick all kinds of filters in the hopes that this improves compression -- it doesn't. :) Thanks for this input! This is the first time I'm seeing this. :) --- *By John on 2020-12-21T20:09:22.861Z* --- To get the png, I ran the attached jpg through a basic IM command: `magick orig.jpg orig.png` Running through all the IM combinations for compression filter, level, and strategy, I can't get down to the pngcrush size. The best I can do is by setting those values, respectively, to 130, 131, 230, 231, 330, 331, 430, and 431. All of which give identically sized files: 357630. The pngcrush output was 343905. If I understand it correctly, pngcheck shows multiple data chunks with no compression, so presumably it's the chunking that's using up the extra 14k? ![orig](/uploads/c65a87339671a4f332128defd7332cea/orig.jpg) --- *By josch on 2020-12-21T21:05:12.670Z* --- You are probably talking about the options `png:compression-filter`, `png:compression-level` and `png:compression-strategy`? Those only control the zip compression. How png works and how it achieves such good compression compared to bmp or gif is the paeth filter. That filter is *not* a compression. It just filters the data in a way that makes it really well suited for zip compression. In most cases, that is. You found one example, where the paeth filter makes it worse. So you would have to somehow tell imagemagick not to use the paeth filter but hand the data to the zip compressor without filtering. I don't know if there is an option for that. --- *By John on 2020-12-21T21:31:52.523Z* --- I'm pretty sure that I covered paeth in there, and I think it's a filter value of "4". The other png compression method is using the -quality option. I tried all of those options as well. See details here: https://legacy.imagemagick.org/Usage/formats/#png_quality I'm hoping to find out on the IM boards whether IM uses pngcrush before putting a png into a PDF.
josch closed this issue 3 years ago
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: josch/img2pdf#88
Loading…
There is no content yet.