Having an issue with latest version and tif files. #48
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
By George on 2018-08-21T21:13:34.013Z
So. I have code like this:
Using idle and img2pdf this code works just fine. (it is version 0.2.4 of img2pdf I believe?)
Today I tried using anaconda with this same code but with the latest version that I think is 0.3.1
Now when I run I get the following errors:
Any clue what is going on?
By josch on 2018-08-21T21:46:36.581Z
Could you assemble a minimal example that shows the problem you have and also provide the specific input image that causes the error to appear?
By George on 2018-08-21T22:08:57.892Z
Thanks for the quick response!
Sorry. I unfortunately cannot provide any of these documents as they are confidential legal documents. I can try later tonight to see if I can reproduce it using a regular tif and get back to you.
Not sure what you mean by a minimal example? The loop above is the exact code that generates the errors. In this instance I simply ran it through one directory with one tif file in it and got the long list of errors.
I did a quick diagnostic and the exact line causing the issue appears to be the "pdf_bytes=img2pdf.convert(image_files)" line. It does not get past it.
By josch on 2018-08-21T22:17:13.772Z
Depending on how these documents were created, you could create a document just like those but without any confidential info in it. For example, if they were created by a scanner, you could just scan a blank piece of paper.
Without being able to reproduce the problem myself, I'm afraid there is little I can do.
You are of course always free to fix the problem yourself and then send me the patch that solves it. I'm afraid it's only possible to fix these kind of problems properly if one can reproduce them. And that is only possible when one has some data that triggers the problem.
By josch on 2018-08-23T07:15:09.752Z
Hi @Nukular -- any chance that you can obtain some test data for me? Otherwise I'm afraid I have to close this issue because there is nothing I can do about it without being able to look into the issue myself.
By George on 2018-08-25T15:27:17.486Z
Hi @josch,
I can't quite replicate this with any old Tif. And as I said I have no way to send you a sample file as they are all confidential legal documents provided to us by a client.
However I am able to use Photoshop to extract document info from a sample .tif that I know didn't work. I don't know if this is helpful or not. Please let me know.
By josch on 2018-08-26T20:24:41.359Z
Hi, unfortunately there is nothing particularly interesting in this RDF info. The metadata looks completely normal. If you have imagemagick installed, you could run
identify -verbose
on one of the tiffs to get more infos. Or if you have libtiff installed, runningtiffdump
ortiffinfo
might be helpful.By George on 2018-08-26T20:51:33.542Z
Hi @josch
Did a little further digging in photoshop. I tried to edit the image (deleting everything) but simply saving in photoshop made the image readable again. One difference I saw is that the ORIGINAL non-working image is group4 compression. Photoshop version has no compression. Could that cause it?
Along with that, this file did work using version 0.2.4, but that is only if I use IDLE on my windows machine (as 0.2.4 is a separate installation). Personal computer and Anaconda on my work machine, both use the latest version.
I just saw your recent reply and will see what I can find.
By josch on 2018-08-26T20:59:48.605Z
No, img2pdf supports others group4 compressed tiff images just fine. Such an image is part of its test suite.
Yes, img2pdf versions less than 0.3.0 should not have this bug because in those versions, tiff images like yours were saved as uncompressed image data, resulting in very large files.
What we have to find out is what makes your tiff image that doesn't work different from those that do work.
With Imagemagick, you can create group4 compressed tiff images using this command:
You will see that img2pdf is able to handle this kind of tiff images just fine.
By George on 2018-08-27T02:27:50.319Z
Alright. Here is the output from tiffdump for the file that DOES work:
Here are the same results for the original version of the file that does NOT work with img2pdf:
By josch on 2018-08-27T05:43:49.305Z
That's it! The
RowsPerStrip
property is missing. This property is essential for decoding group4 tiff images. Currently, img2pdf requires it to be set but the tiff spec indeed says, that if it's not set, then the default value is2**32 - 1
which basically means that the entire image is a single strip. Let me try to create a tiff like that and I'll come back to you in a bit!By josch on 2018-08-27T06:15:35.778Z
From the function
ccitt_payload_location_from_pil
could you remove the line that says:And see if that fixes things for you?
By George on 2018-08-27T14:22:45.499Z
That seems to work!
By josch on 2018-11-20T15:31:23.336Z
Status changed to closed by commit
42f8ac54a8