use /LZWDecode filter for GIF and matching TIFF images #174
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The GIF image format uses LZW compression, which is also a valid compression format for TIFF images. The PDF format provides an
/LZWDecode
filter that seems like a perfect match to embed GIF and certain TIFF images without transcoding image data.A test with img2pdf 3.3.0 (Ubuntu 20.04) shows that GIF and LZW compressed TIFF images are always transcoded into
/FlateDecode
streams by img2pdf.Wouldn't it make sense to preserve LZW encoded image data in these cases?
There's one caveat, though. The PDF/A standard explicitly forbids
/LZWDecode
data streams. That is, when the--pdfa
option is given, image data should be transcoded to use/FlateDecode
streams, again.As an example, a PDF file using the
/LZWDecode
filter can be created using GraphicsMagick like this:Yes, that would make sense if it is possible.
Do you have very large GIF images were re-encoding them slows you down?
One problem is, that Pillow does not give me access to the compressed data and img2pdf would need to learn how to access and extract just the right bits from the input image.
Would you like to propose a patch?
Well, I do not really have a use-case for
/LZWDecode
streams in PDF files. In fact, I'm actually trying to avoid those, because they're forbidden for PDF/A compliant files. I have stumbled across the issue while testing what error my PDF/A validation software (veraPDF) returns when presenting a file usingLZWDecode
filter. Knowing that img2pdf preserves image data as far as possible I fired-up img2pdf just to find me wondering veraPDF not triggering an alarm. Only after inspection, I found out img2pdf output uses/FlateDecode
streams.And then, I'm neither a Python nor C guy. Sorry, all I can contribute is an idea for enhancement.
The
libtiff-tools
package contains a tool tiff2pdf that seems to serve a similar purpose than img2pdf, but just for TIFF images. Though, it doesn't seem to preserve LZW compression.However, it is able to preserve image data of some sub-formats without transcoding. From the man page:
Maybe the code can be used as a reference.
Thank you! I've tagged this issue as "enhancement" and "help wanted", so if somebody finds some time to implement this I'd be happy to review patches. 😄