Support CCITT Group4 input tiffs without recompression #47

Closed
opened 2021-04-25 19:58:10 +00:00 by josch · 0 comments
Owner

By Jack Cushman on 2018-08-01T14:36:09.116Z

Hi! I have a bunch of TIFF files that are already compressed with CCITT Group4. Here's a sample. Would it be possible to convert those to PDF without recompression?

From glancing at the code, I think this involves borrowing some of the logic from transcode_monochrome and handling precompressed TIFFs specially in read_images, but I'm not sure I have my head wrapped around it enough to implement it correctly.

Thanks for this great library!


By josch on 2018-08-01T18:15:05.186Z


Your analysis is accurate. Using the code from transcode_monochrome one can indeed extract the Group 4 encoded payload from a TIFF image and directly embed it into a PDF without any reencoding in read_images. That was a great idea, thanks!

My only reason for getting a figurative headache while implementing this feature was, that after embedding the CCITT Group 4 data into a PDF container, your sample image showed up with inverted colors in all my PDF readers:

inverted.pdf

Turns out, that the TIFF format can carry a flag that lets 0 by white and 1 be black which is the opposite of the usual meaning of 0 and 1.

Try the following with your input image:

% identify -verbose sample.tif | grep tiff:photometric
    tiff:photometric: min-is-white

But after having accounted for this additional oddity of the TIFF format it seems to work fine now! Thanks for this idea. I should soon be able to push the commit that closes this issue.


By josch on 2018-08-01T20:14:37.427Z


Status changed to closed by commit f5d8d86dff


By Jack Cushman on 2018-08-01T20:30:27.124Z


Wow, you're a wizard! Thanks so much.

*By Jack Cushman on 2018-08-01T14:36:09.116Z* Hi! I have a bunch of TIFF files that are already compressed with CCITT Group4. [Here's a sample.](/uploads/13140278a3c0d92f6cdf749527274884/32044057891608_00009_0.tif) Would it be possible to convert those to PDF without recompression? From glancing at the code, I think this involves borrowing some of the logic from `transcode_monochrome` and handling precompressed TIFFs specially in `read_images`, but I'm not sure I have my head wrapped around it enough to implement it correctly. Thanks for this great library! --- *By josch on 2018-08-01T18:15:05.186Z* --- Your analysis is accurate. Using the code from `transcode_monochrome` one can indeed extract the Group 4 encoded payload from a TIFF image and directly embed it into a PDF without any reencoding in `read_images`. That was a great idea, thanks! My only reason for getting a figurative headache while implementing this feature was, that after embedding the CCITT Group 4 data into a PDF container, your sample image showed up with inverted colors in all my PDF readers: [inverted.pdf](/uploads/604ea876bbb79a34976fd6522fd62e8b/out.pdf) Turns out, that the TIFF format can carry a flag that lets 0 by white and 1 be black which is the opposite of the usual meaning of 0 and 1. Try the following with your input image: % identify -verbose sample.tif | grep tiff:photometric tiff:photometric: min-is-white But after having accounted for this additional oddity of the TIFF format it seems to work fine now! Thanks for this idea. I should soon be able to push the commit that closes this issue. --- *By josch on 2018-08-01T20:14:37.427Z* --- Status changed to closed by commit f5d8d86dfffe6f438c2e977fbbad9df99fbb9653 --- *By Jack Cushman on 2018-08-01T20:30:27.124Z* --- Wow, you're a wizard! Thanks so much.
josch closed this issue 2021-04-25 19:58:10 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: josch/img2pdf#47
No description provided.