0.3.1 Only Converting First Page of Multi-page TIF #50

New issue

Closed

opened 2021-04-25 19:58:13 +00:00 by josch · 0 comments

josch commented

2021-04-25 19:58:13 +00:00

Owner

By Michael B Sopko on 2018-10-29T17:22:16.190Z

I have a multi-page TIF image that I am converting to PDF. Before version 0.3.1, the PDF was created as expected with 1 page on the PDF being 1 page of the TIF image. Now, just the first page of the TIF image is being set to the first page of the PDF and everything else is lost. I am using Python version 3.6. I am saving the file like this:

self.layout_fun = img2pdf.get_layout_fun((
    img2pdf.in_to_pt(8.5),
    img2pdf.in_to_pt(11)
))

with SpooledTemporaryFile() as tmp:
    tmp.write(img2pdf.convert(
        image,
        layout_fun=self.layout_fun
    ))

    document.save_document_file_obj(tmp)

Is there something else I need to do (like breaking out the image pages)? If so, please let me know. If needed, I can put together an image and code to show what is occurring.

By josch on 2018-10-29T17:25:32.862Z

Yes, please show me the multipage TIF where this problem happens. Otherwise I cannot investigate the issue.

By Michael B Sopko on 2018-10-29T18:18:43.842Z

I tried to get a random multipage tif file and it worked just fine. When I try the document I am trying to convert, I am getting an error. I cannot send this file to you due to personal information within the image.

When I did use version 0.3.0, I did see this within the logs:

TIFFSetField: tempfile.tif: Unknown pseudo-tag 65538.
TIFFSetField: tempfile.tif: Unknown pseudo-tag 65538.
TIFFSetField: tempfile.tif: Unknown pseudo-tag 65538.
TIFFSetField: tempfile.tif: Unknown pseudo-tag 65538.
TIFFSetField: tempfile.tif: Unknown pseudo-tag 65538.

I am not seeing this being logged when using version 0.3.1. My guess is they have a bad image format causing this.

I did a test before with another sample image and had the same result (only the first page being on the PDF) so I thought this was occurring with all multi-page TIFs. Too bad I deleted the test file.

I will take another look into this. I will see about using PIL/Pillow to split up the pages into separate images in memory and pass them into the convert function and see if I can get it working.

Thanks,

By Michael B Sopko on 2018-10-29T18:18:44.081Z

Status changed to closed

By Michael B Sopko on 2018-10-30T12:51:51.160Z

I asked the vendor if they were able to provide a TIF image without any personal data to help diagnose the issue. If they send me one, I will reopen this issue.

In the meantime, I broke up the TIF image with PIL like:

import io

import img2pdf
from PIL import Image

images = []
return_data = io.BytesIO()
tiffstack = Image.open('988962.tif')

for i in range(tiffstack.n_frames):
    tiffstack.seek(i)
    tiffstack.save(return_data, 'TIFF')

    images.append(return_data.getvalue())

with open('988962.pdf', 'wb') as tmp:
    tmp.write(img2pdf.convert(images))

and it is working as expected.

By Stefan on 2018-11-15T18:21:35.618Z

I am having the same issue and I am not sure if this is the case for Michael, but for myself, I noticed that any TIFF compressed with CCITT was not converting properly.

In the image above, the uncompressed TIFF was able to convert all of the pages properly, while the CCITT TIFF's only did the first page. I did another test and decompressed one of the CCITT TIFF's and was able to convert it properly then.

By Michael B Sopko on 2018-11-15T19:37:23.951Z

I do not know a lot about images, so I didn't know to look at the compression (or actually what information was needed).

I just double checked the properties of the image giving me problems and it to has CCITT T.6 set for compression.

Now the real question, does anyone have an example image for testing?

By Stefan on 2018-11-15T20:17:59.061Z

Sadly, I am not able to provide any of my images as they have confidential information in them. Although, I came up with this as a temporary work around for myself. You are free to use it if you like.

# Function to convert TIFF images to PDF
def tiffToPdf(tiffPath):
    # Iterate through directory to find TIFF files
    images = glob.glob(os.path.join(tiffPath, '*/*.tif*'))
    j = 0
    for image in images:
        # Retrieve the root path for the final PDF doc
        pdfPath = Path(image).parent
        # Retrieve original file name of the TIFF image
        pdfName = str(pdfPath) + '\\' + Path(image).stem
        # Rename original compressed image to a temporary name to avoid conflicts
        tempTIFF = str(Path(image).parent) + '\\decompressed (' + str(j) + ')' + Path(image).suffix
        os.rename(image, tempTIFF)
        # Decompress TIFF image for proper conversion
        subprocess.run([r'E:\Python Scripts\image2pdf_emf2pdf_cmd\tiffcp.exe', '-c', 'none', tempTIFF, image])
        pdfFile = pdfName + '.pdf'
        with open(pdfFile, 'wb') as newPdf:
            newPdf.write(img2pdf.convert(image))
        Path.unlink(Path(tempTIFF))
        Path.unlink(Path(image))
        j += 1

Here is the link to the command line tool used to decompress the TIFF's: www.verypdf.com/tif2pdf/image2pdf_emf2pdf_cmd.zip

By josch on 2018-11-15T21:16:39.480Z

@msopko and @gizmotechy please try to create a test case for me! Yes, CCITT images are problematic and caused problems in the past. But multi-page CCITT tiff images are also part of the test suite and they work fine there. So something has to be different with your tiff images and I cannot figure out what it is without getting a sample of such a file.

The only other alternative I see, is that one of you two manages to figure out the problem behind your CCITT images and sends me a patch that fixes the issue for you.

By Stefan on 2018-11-15T21:55:02.370Z

@josch I think I just found the issue. The TIFFSetField: tempfile.tif: Unknown pseudo-tag 65538. error that @msopko was getting is an error in Pillow, which is the library that you are using. I didn't realize you were using Pillow within your library until I looked at the .py file. Here is the GitHub commit discussing this error: https://github.com/python-pillow/Pillow/pull/3417. They were able to fix it from the looks of it, but they will not be pushing the update to PyPi until January 1st, 2019.

Also, if you would like to test it yourself, I was able to find a multi-page TIFF tool that provided a sample TIFF. I converted the compression from LZW to CCITT T.6 to test it myself and found that the LZW compressed TIFF was fine, but the CCITT TIFF only grabbed the first page. You can find the TIFF's attached. The 'test.tif' file is the original with LZW compression, and the 'test2.tif' is the one I compressed with CCITT.

test2.tif
test.tif

By josch on 2018-11-15T22:51:51.904Z

@gizmotechy I tried your files test.tif and test2.tif with the recent version of img2pdf from git and I didn't see any problems.

By josch on 2018-11-20T15:57:51.931Z

I just released version 0.3.2 which should fix your issues.

By Michael B Sopko on 2018-11-27T13:03:04.792Z

I just ran a test with the latest version and it is working great.

Thank you!

*By Michael B Sopko on 2018-10-29T17:22:16.190Z* I have a multi-page TIF image that I am converting to PDF. Before version 0.3.1, the PDF was created as expected with 1 page on the PDF being 1 page of the TIF image. Now, just the first page of the TIF image is being set to the first page of the PDF and everything else is lost. I am using Python version 3.6. I am saving the file like this: ```python self.layout_fun = img2pdf.get_layout_fun(( img2pdf.in_to_pt(8.5), img2pdf.in_to_pt(11) )) with SpooledTemporaryFile() as tmp: tmp.write(img2pdf.convert( image, layout_fun=self.layout_fun )) document.save_document_file_obj(tmp) ``` Is there something else I need to do (like breaking out the image pages)? If so, please let me know. If needed, I can put together an image and code to show what is occurring. --- *By josch on 2018-10-29T17:25:32.862Z* --- Yes, please show me the multipage TIF where this problem happens. Otherwise I cannot investigate the issue. --- *By Michael B Sopko on 2018-10-29T18:18:43.842Z* --- I tried to get a random multipage tif file and it worked just fine. When I try the document I am trying to convert, I am getting an error. I cannot send this file to you due to personal information within the image. When I did use version 0.3.0, I did see this within the logs: ``` TIFFSetField: tempfile.tif: Unknown pseudo-tag 65538. TIFFSetField: tempfile.tif: Unknown pseudo-tag 65538. TIFFSetField: tempfile.tif: Unknown pseudo-tag 65538. TIFFSetField: tempfile.tif: Unknown pseudo-tag 65538. TIFFSetField: tempfile.tif: Unknown pseudo-tag 65538. ``` I am not seeing this being logged when using version 0.3.1. My guess is they have a bad image format causing this. I did a test before with another sample image and had the same result (only the first page being on the PDF) so I thought this was occurring with all multi-page TIFs. Too bad I deleted the test file. I will take another look into this. I will see about using PIL/Pillow to split up the pages into separate images in memory and pass them into the convert function and see if I can get it working. Thanks, --- *By Michael B Sopko on 2018-10-29T18:18:44.081Z* --- Status changed to closed --- *By Michael B Sopko on 2018-10-30T12:51:51.160Z* --- I asked the vendor if they were able to provide a TIF image without any personal data to help diagnose the issue. If they send me one, I will reopen this issue. In the meantime, I broke up the TIF image with PIL like: ```python import io import img2pdf from PIL import Image images = [] return_data = io.BytesIO() tiffstack = Image.open('988962.tif') for i in range(tiffstack.n_frames): tiffstack.seek(i) tiffstack.save(return_data, 'TIFF') images.append(return_data.getvalue()) with open('988962.pdf', 'wb') as tmp: tmp.write(img2pdf.convert(images)) ``` and it is working as expected. --- *By Stefan on 2018-11-15T18:21:35.618Z* --- I am having the same issue and I am not sure if this is the case for Michael, but for myself, I noticed that any TIFF compressed with CCITT was not converting properly. ![image](/uploads/ef463e3448f5cd458d174b704d1d4a65/image.png) In the image above, the uncompressed TIFF was able to convert all of the pages properly, while the CCITT TIFF's only did the first page. I did another test and decompressed one of the CCITT TIFF's and was able to convert it properly then. --- *By Michael B Sopko on 2018-11-15T19:37:23.951Z* --- I do not know a lot about images, so I didn't know to look at the compression (or actually what information was needed). I just double checked the properties of the image giving me problems and it to has CCITT T.6 set for compression. ![image](/uploads/411700941d06780df3009a2bfbb96a4b/image.png) Now the real question, does anyone have an example image for testing? --- *By Stefan on 2018-11-15T20:17:59.061Z* --- Sadly, I am not able to provide any of my images as they have confidential information in them. Although, I came up with this as a temporary work around for myself. You are free to use it if you like. ```python # Function to convert TIFF images to PDF def tiffToPdf(tiffPath): # Iterate through directory to find TIFF files images = glob.glob(os.path.join(tiffPath, '*/*.tif*')) j = 0 for image in images: # Retrieve the root path for the final PDF doc pdfPath = Path(image).parent # Retrieve original file name of the TIFF image pdfName = str(pdfPath) + '\\' + Path(image).stem # Rename original compressed image to a temporary name to avoid conflicts tempTIFF = str(Path(image).parent) + '\\decompressed (' + str(j) + ')' + Path(image).suffix os.rename(image, tempTIFF) # Decompress TIFF image for proper conversion subprocess.run([r'E:\Python Scripts\image2pdf_emf2pdf_cmd\tiffcp.exe', '-c', 'none', tempTIFF, image]) pdfFile = pdfName + '.pdf' with open(pdfFile, 'wb') as newPdf: newPdf.write(img2pdf.convert(image)) Path.unlink(Path(tempTIFF)) Path.unlink(Path(image)) j += 1 ``` Here is the link to the command line tool used to decompress the TIFF's: www.verypdf.com/tif2pdf/image2pdf_emf2pdf_cmd.zip --- *By josch on 2018-11-15T21:16:39.480Z* --- @msopko and @gizmotechy please try to create a test case for me! Yes, CCITT images are problematic and caused problems in the past. But multi-page CCITT tiff images are also part of the test suite and they work fine there. So something has to be different with your tiff images and I cannot figure out what it is without getting a sample of such a file. The only other alternative I see, is that one of you two manages to figure out the problem behind your CCITT images and sends me a patch that fixes the issue for you. --- *By Stefan on 2018-11-15T21:55:02.370Z* --- @josch I think I just found the issue. The `TIFFSetField: tempfile.tif: Unknown pseudo-tag 65538.` error that @msopko was getting is an error in Pillow, which is the library that you are using. I didn't realize you were using Pillow within your library until I looked at the .py file. Here is the GitHub commit discussing this error: https://github.com/python-pillow/Pillow/pull/3417. They were able to fix it from the looks of it, but they will not be pushing the update to PyPi until January 1st, 2019. Also, if you would like to test it yourself, I was able to find a multi-page TIFF tool that provided a sample TIFF. I converted the compression from LZW to CCITT T.6 to test it myself and found that the LZW compressed TIFF was fine, but the CCITT TIFF only grabbed the first page. You can find the TIFF's attached. The 'test.tif' file is the original with LZW compression, and the 'test2.tif' is the one I compressed with CCITT. [test2.tif](/uploads/480c4b6b1b5ada366a5323ec59731abc/test2.tif) [test.tif](/uploads/23f051a51074c3b6103df4cb01b66ca2/test.tif) --- *By josch on 2018-11-15T22:51:51.904Z* --- @gizmotechy I tried your files test.tif and test2.tif with the recent version of img2pdf from git and I didn't see any problems. --- *By josch on 2018-11-20T15:57:51.931Z* --- I just released version 0.3.2 which should fix your issues. --- *By Michael B Sopko on 2018-11-27T13:03:04.792Z* --- I just ran a test with the latest version and it is working great. Thank you!