new(?) issue with converting 2-color TIFF images to PDF. #164
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I've read issue #66, and I'm aware that the problem reported there was fixed in version 0.3.4 of img2pdf. However, I'm currently running into exactly the same symptom with version 0.4.4 on Manjaro Linux (22.1.0).
Specifically, for various reasons I need to be able to work with bilevel TIFF images, and when a number of these (with exactly one image per .tif file) are converted to the same PDF file, most of the time Acrobat Reader will report "insufficient data for an image". This sometimes happens immediately when I try to open the file, but other times it occurs in the middle of the PDF file. The same PDF files worke perfectly with every program I've tried on Linux.
Furthermore, the problem occurs regardess of whether I convert many .tif files in one invocation of img2pdf, or whether I convert each image separately and later concatenate them with pdftk.
I can supply example files (both .tif and PDF) if that would help, along with whatever additional information may be useful.
After some experimenting, it seems that the problem occurs only for images that were edited using gimp version version 2.10.34, and exported from gimp using its default export options. I can't prove that this is the cause, but images exported this way reliably trigger the problem, and turning off gimp's "Export the image's color profile by default" option produces images that are reliably converted without errors.
The problematic images produce PDF files with this symptom:
% gs -dNOPAUSE -dBATCH -sDEVICE=nullpage borked.pdf
GPL Ghostscript 10.01.1 (2023-03-27)
Copyright (C) 2023 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Unknown .defaultpapersize: (Letter).
Processing pages 1 through 1.
Page 1
The following errors were encountered at least once while processing this file:
ICCbased space /N value does not match the number of components in the ICC profile
The following warnings were encountered at least once while processing this file:
recoverable image error
...so it makes sense that the color profile is implicated.
I seem unable to find out how to create a bilevel image with GIMP. I can do a 256 color palette image or an 8-bit grayscale image. But which buttons do i have to click to make my image have only two colors?
Do you have an example file that triggers the problem?
I already replied to this by email, but I'll copy that reply here for the record.
I don't know how to create bilevel TIFF images in gimp either. The images I'm working with were created by scanning on Brother MFC-J6510DW all-in-one, using this command:
These scanned images are correctly converted by img2pdf.
The problem arises because the scanned images often (but not always) need to be touched up in various ways, which I do using gimp.
It took a while before I realized that the problem occurs repeatably with images saved by gimp, but doesn't occur at all without gimp. That was when I took a closer look and discovered that I could work around the problem by turning off the gimp export option to save the color profile.
As for samples, I've uploaded two files with this message. The first was edited and will fail, and the second wasn't edited and will be converted correctly.
I believe this is this issue: https://gitlab.gnome.org/GNOME/gimp/-/issues/9518
I think you're right. Is there any chance you'd be willing to modify img2pdf to detect that condition and ignore the false color profile?
Yes. But that would require me to extract and parse the color profile and I do not know yet how to do that. Patches welcome.
Do you really need to parse the color profile? You already know that the image is a bilevel TIFF with CCITT Group 4 encoding; isn't it safe to forcibly set iccp to None and proceed on that basis?
Just for fun, I added a print statement to the version 0.4.4 source code as follows:
Running this version on the uploaded page_002_A-scanned.tif produces the output
as expected. Running it on the uploaded page_001_A-scanned.tif produces this:
So you're already extracting the color profile. I don't know how how to parse it either, but I'll take a look and see what I can find.
This is probably a stupid question, but why doesn't '-C 1' sidestep the problem?
The TIFF produced by GIMP are not CCITT Group 4 encoded but grayscale images with just two colors.
The problem with just dropping the ICC profile is, that the one thing that img2pdf does different from other conversion tools is, that img2pdf is lossless. So we have to be very careful before we discard any information from the input.
I didn't know that about the format produced by gimp, so thank you for clarifying that.
Also, I understand why you need to be careful about losing information, but that's where the -C option comes back into the picture: even when I specify -C 1, the incorrect RGB color profile isn't replaced, and I don't understand why not. Is that meant to happen?
I pushed a commit that should fix this issue. Please re-open if it does not for you.
I just tried it and it works beautifully. Thank you!
Sadly, I spoke too soon. The new update absolutely does correctly handle the case where the image has the incorrect color profile, but for images which don't have that (such as the original page_002_A-scanned.tif I uploaded), I get this:
Are you at the most recent git HEAD? I pushed another commit 40 minutes ago.
I'm not sure exactly what happened, but I have the latest commit now, and this one really does work perfectly. My apology for the inconvenience, and thank you again for doing this!
It wasn't your fault but mine. The original commit missed another condition and thus triggered the bug you saw. I fixed that problem in another commit shortly after.
I'm happy that this is working for you now!
Thank you for helping me track down this problem 💙