Issue to convert tiff file #46

Closed
opened 2021-04-25 19:58:08 +00:00 by josch · 9 comments
Owner

By Giovanni Zambotti on 2018-07-16T13:40:06.881Z

Hello,

I have some issue to convert the attached tiff file. I am using the python lib (img2pdf). Any suggestion?
305_0107.tif
Also, I set up a try-except statement but it does not retrieve any error. Do you know the best way to handle errors?
Thank you in advance for your work.

Giovanni


By josch on 2018-07-17T19:29:46.321Z


I'm able to reproduce your problem:

$ src/img2pdf.py --pillow-limit-break --producer="" --nodate -o out.pdf --verbose ~/Downloads/305_0107.tif
DEBUG:root:imgformat = other
DEBUG:root:Converting frame: 0
DEBUG:root:input dpi = 400 x 400
DEBUG:root:input colorspace = 1
DEBUG:root:width x height = 15434px x 8800px
DEBUG:root:Converting monochrome to CCITT Group4
python3: tif_dirwrite.c:1951: TIFFWriteDirectoryTagSubifd: Assertion `pa != 0' failed.
Aborted

Specifically the problem seems to come directly from libtiff here:

https://sources.debian.org/src/tiff/4.0.9-6/libtiff/tif_dirwrite.c/?hl=1925#L1954

This looks like the tiff parser encountered a NULL pointer but I'm not familiar with libtiff at all.


By josch on 2018-07-17T23:59:10.244Z


Status changed to closed by commit 75c43de09d


By josch on 2018-07-17T23:59:56.857Z


I found a workaround that fixes your issue (by creating a new PIL image internally). Can you confirm that the issue is indeed fixed? Otherwise, please re-open this issue. Thanks!


By josch on 2018-07-18T06:41:40.865Z


@gzambotti I would like to report this problem to the maintainers of the Python Imaging Library as well as libtiff. Since your tiff file is the only one that's causing this problem, I would like to use it as an example. Am I allowed to use it for this purpose?


By Giovanni Zambotti on 2018-07-18T14:35:38.990Z


Sure. You can do that. I guess I need to reinstall the library to test if your workaround is working, correct? I am using conda, can I do it in that environment?


By josch on 2018-07-18T14:38:27.869Z


Thank you for your permission!

You don't have to reinstall it, you just have to upgrade it. I uploaded a new release to pypi and you should be able to install it from there.

I'm not familiar with conda.


By Giovanni Zambotti on 2018-07-18T17:52:23.462Z


It seems working very well. Thank you! Last question, I need to compress the pdf as much as possible without losing quality/resolution. Can I do it with img2pdf?


By josch on 2018-07-18T17:57:47.444Z


Depends what you mean by "as much as possible". You could certainly shave a few more bytes off using several tricks but you will not get significantly smaller. For example if you would compress parts of the PDF container, you could shave off 100 bytes per document or so. Another option for you would be to first (losslessly) convert your data into PNG and only then converting to pdf. Depending on the data you have, that might compress slightly better. The reply is really "it depends".


By Giovanni Zambotti on 2018-07-18T18:29:19.099Z


Shave off 100 bytes will be helpful. Do you have any documentation or code that you can share that will help me to do that? thank you.


By josch on 2018-07-18T18:50:33.236Z


There are two things you can do:

Compress the PDF metadata

Either use pdftk:

 pdftk input.pdf output out.pdf compress

Or use mutool from mupdf:

mutool clean -z input.pdf out.pdf

Compress the image itself

For this you have to use PNG as I'm not aware of a tool that is able to compress tiff in a similar way.

Either use pngcrush:

pngcrush -brute input.png out.png

Or optipng:

optipng -o4 -i0 input.png

Or advpng

advpng -z4 input.png

Then feed that optimized png into img2pdf.

*By Giovanni Zambotti on 2018-07-16T13:40:06.881Z* Hello, I have some issue to convert the attached tiff file. I am using the python lib (img2pdf). Any suggestion? [305_0107.tif](/uploads/65b81f6369432a25e957457bcdee4d74/305_0107.tif) Also, I set up a try-except statement but it does not retrieve any error. Do you know the best way to handle errors? Thank you in advance for your work. Giovanni --- *By josch on 2018-07-17T19:29:46.321Z* --- I'm able to reproduce your problem: ``` $ src/img2pdf.py --pillow-limit-break --producer="" --nodate -o out.pdf --verbose ~/Downloads/305_0107.tif DEBUG:root:imgformat = other DEBUG:root:Converting frame: 0 DEBUG:root:input dpi = 400 x 400 DEBUG:root:input colorspace = 1 DEBUG:root:width x height = 15434px x 8800px DEBUG:root:Converting monochrome to CCITT Group4 python3: tif_dirwrite.c:1951: TIFFWriteDirectoryTagSubifd: Assertion `pa != 0' failed. Aborted ``` Specifically the problem seems to come directly from libtiff here: https://sources.debian.org/src/tiff/4.0.9-6/libtiff/tif_dirwrite.c/?hl=1925#L1954 This looks like the tiff parser encountered a NULL pointer but I'm not familiar with libtiff at all. --- *By josch on 2018-07-17T23:59:10.244Z* --- Status changed to closed by commit 75c43de09dc872ed3ce4ace6cad9d708318be5ad --- *By josch on 2018-07-17T23:59:56.857Z* --- I found a workaround that fixes your issue (by creating a new PIL image internally). Can you confirm that the issue is indeed fixed? Otherwise, please re-open this issue. Thanks! --- *By josch on 2018-07-18T06:41:40.865Z* --- @gzambotti I would like to report this problem to the maintainers of the Python Imaging Library as well as libtiff. Since your tiff file is the only one that's causing this problem, I would like to use it as an example. Am I allowed to use it for this purpose? --- *By Giovanni Zambotti on 2018-07-18T14:35:38.990Z* --- Sure. You can do that. I guess I need to reinstall the library to test if your workaround is working, correct? I am using conda, can I do it in that environment? --- *By josch on 2018-07-18T14:38:27.869Z* --- Thank you for your permission! You don't have to reinstall it, you just have to upgrade it. I uploaded a new release to pypi and you should be able to install it from there. I'm not familiar with conda. --- *By Giovanni Zambotti on 2018-07-18T17:52:23.462Z* --- It seems working very well. Thank you! Last question, I need to compress the pdf as much as possible without losing quality/resolution. Can I do it with img2pdf? --- *By josch on 2018-07-18T17:57:47.444Z* --- Depends what you mean by "as much as possible". You could certainly shave a few more bytes off using several tricks but you will not get significantly smaller. For example if you would compress parts of the PDF container, you could shave off 100 bytes per document or so. Another option for you would be to first (losslessly) convert your data into PNG and only then converting to pdf. Depending on the data you have, that might compress slightly better. The reply is really "it depends". --- *By Giovanni Zambotti on 2018-07-18T18:29:19.099Z* --- Shave off 100 bytes will be helpful. Do you have any documentation or code that you can share that will help me to do that? thank you. --- *By josch on 2018-07-18T18:50:33.236Z* --- There are two things you can do: Compress the PDF metadata ----------------------------------- Either use pdftk: pdftk input.pdf output out.pdf compress Or use mutool from mupdf: mutool clean -z input.pdf out.pdf Compress the image itself -------------------------------- For this you have to use PNG as I'm not aware of a tool that is able to compress tiff in a similar way. Either use pngcrush: pngcrush -brute input.png out.png Or optipng: optipng -o4 -i0 input.png Or advpng advpng -z4 input.png Then feed that optimized png into img2pdf.
josch closed this issue 2021-04-25 19:58:09 +00:00
Contributor

I'd like to check whether the workaround could be dropped if using recent libtiff, but the link to the test file is broken... Do you still have it?

I'd like to check whether the workaround could be dropped if using recent libtiff, but the link to the test file is broken... Do you still have it?
Author
Owner

Yes, here you go:

Yes, here you go:
Contributor

Thanks!

Thanks!
Contributor

The workaround seems to be unnecessary with the Pillow version that I have installed (9.1.0.dev0 built from source with Ubuntu 20.04 libs).

The workaround seems to be unnecessary with the Pillow version that I have installed (9.1.0.dev0 built from source with Ubuntu 20.04 libs).
Author
Owner

You mean the workaround introduced by 75c43de09d ?

You mean the workaround introduced by 75c43de09dc872ed3ce4ace6cad9d708318be5ad ?
Contributor

Yes, exactly.

Yes, exactly.
Author
Owner

That's good to know but the workaround still needs to stay to support older versions of libtiff. Or is there a reason to remove it or make it conditional?

That's good to know but the workaround still needs to stay to support older versions of libtiff. Or is there a reason to remove it or make it conditional?
Contributor

Or is there a reason to remove it or make it conditional?

Not really. I wouldn't expect that creating a copy of the image makes much of a performance difference.

> Or is there a reason to remove it or make it conditional? Not really. I wouldn't expect that creating a copy of the image makes much of a performance difference.
Author
Owner

I agree. And if we start thinking about performance, then the whole transcode_monochrome functions should be replaced by code that actually encodes CCITT Group4 in C without going through PIL and then retrieves the bytestream as a quick hack.

I agree. And if we start thinking about performance, then the whole `transcode_monochrome` functions should be replaced by code that actually encodes CCITT Group4 in C without going through PIL and then retrieves the bytestream as a quick hack.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: josch/img2pdf#46
No description provided.