Issue to convert tiff file #46
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
By Giovanni Zambotti on 2018-07-16T13:40:06.881Z
Hello,
I have some issue to convert the attached tiff file. I am using the python lib (img2pdf). Any suggestion?
305_0107.tif
Also, I set up a try-except statement but it does not retrieve any error. Do you know the best way to handle errors?
Thank you in advance for your work.
Giovanni
By josch on 2018-07-17T19:29:46.321Z
I'm able to reproduce your problem:
Specifically the problem seems to come directly from libtiff here:
https://sources.debian.org/src/tiff/4.0.9-6/libtiff/tif_dirwrite.c/?hl=1925#L1954
This looks like the tiff parser encountered a NULL pointer but I'm not familiar with libtiff at all.
By josch on 2018-07-17T23:59:10.244Z
Status changed to closed by commit
75c43de09d
By josch on 2018-07-17T23:59:56.857Z
I found a workaround that fixes your issue (by creating a new PIL image internally). Can you confirm that the issue is indeed fixed? Otherwise, please re-open this issue. Thanks!
By josch on 2018-07-18T06:41:40.865Z
@gzambotti I would like to report this problem to the maintainers of the Python Imaging Library as well as libtiff. Since your tiff file is the only one that's causing this problem, I would like to use it as an example. Am I allowed to use it for this purpose?
By Giovanni Zambotti on 2018-07-18T14:35:38.990Z
Sure. You can do that. I guess I need to reinstall the library to test if your workaround is working, correct? I am using conda, can I do it in that environment?
By josch on 2018-07-18T14:38:27.869Z
Thank you for your permission!
You don't have to reinstall it, you just have to upgrade it. I uploaded a new release to pypi and you should be able to install it from there.
I'm not familiar with conda.
By Giovanni Zambotti on 2018-07-18T17:52:23.462Z
It seems working very well. Thank you! Last question, I need to compress the pdf as much as possible without losing quality/resolution. Can I do it with img2pdf?
By josch on 2018-07-18T17:57:47.444Z
Depends what you mean by "as much as possible". You could certainly shave a few more bytes off using several tricks but you will not get significantly smaller. For example if you would compress parts of the PDF container, you could shave off 100 bytes per document or so. Another option for you would be to first (losslessly) convert your data into PNG and only then converting to pdf. Depending on the data you have, that might compress slightly better. The reply is really "it depends".
By Giovanni Zambotti on 2018-07-18T18:29:19.099Z
Shave off 100 bytes will be helpful. Do you have any documentation or code that you can share that will help me to do that? thank you.
By josch on 2018-07-18T18:50:33.236Z
There are two things you can do:
Compress the PDF metadata
Either use pdftk:
Or use mutool from mupdf:
Compress the image itself
For this you have to use PNG as I'm not aware of a tool that is able to compress tiff in a similar way.
Either use pngcrush:
Or optipng:
Or advpng
Then feed that optimized png into img2pdf.
I'd like to check whether the workaround could be dropped if using recent libtiff, but the link to the test file is broken... Do you still have it?
Yes, here you go:
Thanks!
The workaround seems to be unnecessary with the Pillow version that I have installed (9.1.0.dev0 built from source with Ubuntu 20.04 libs).
You mean the workaround introduced by
75c43de09d
?Yes, exactly.
That's good to know but the workaround still needs to stay to support older versions of libtiff. Or is there a reason to remove it or make it conditional?
Not really. I wouldn't expect that creating a copy of the image makes much of a performance difference.
I agree. And if we start thinking about performance, then the whole
transcode_monochrome
functions should be replaced by code that actually encodes CCITT Group4 in C without going through PIL and then retrieves the bytestream as a quick hack.