josch/img2pdf

Fork 10

11x9 image cannot be opened in Acrobat Reader (file is corrupt) #19

New issue

Closed

opened 2021-04-25 19:57:39 +00:00 by josch · 0 comments

josch commented

2021-04-25 19:57:39 +00:00

Owner

By josch on 2015-03-15T09:41:53.361Z

Created by: ysangkok

The following PNM file (img.pnm):

P1
11 9
00000000000
00000100000
00001110000
00001110000
00011111000
00011111000
00111111100
01111111110
00000000000

converted to JP2 with convert img.pnm img.jp2 or with jasper: jasper -f img.pnm -F img.jp2, converted to PDF with ~/img2pdf/img2pdf.py img.jp2 -o img.pdf, opened in Adobe Acrobat Reader, gives me the following error message:

"An error exists on this page".

Imported comments:

By josch on 2012-10-17 04:00:05 UTC

Did you do more tests?

Your subject line suggests that this problem only happens with a size of 11x9 - does it happen with bigger images?
You only convert your image to jpeg 2000. Does the same error happen when you convert it to normal jpeg?
You are testing a monochrome image. Does the same error happen with RGB images?
You are only testing with acrobat reader. Does the same error happen with different PDF parsers?

By ysangkok on 2012-10-17 10:02:34 UTC

Yes. It does. I added an extra black line so it became a 11x10 image, and the same problem occured.
No, it does not.
Using the PPM example in Wikipedia converted to JP2, the resulting PDF file is readable, but completely white with wrong dimensions (square) in acroread. In Evince it's correct. The main page README asks for bug reports when Acroread can't read it.
No, Evince shows no errors, but the image is corrupt.

By josch on 2012-10-17 13:38:52 UTC

can you take a bigger image? Just pick any jpeg and convert it to jp2 using the convert command.
img2pdf does a plain copy of the image data and sets the object properties accordingly. If the jpeg2000 image that was input is of a format that the acrobat reader can't understand then I can't fix the problem. This is only one of the possible explanations though.
the README asks for bugreports for acroread because PDF is severely underspecified and the only way to know if you output correct PDF is by testing it with acroread
it can very well be that poppler (the evince engine) doesnt understand jpeg2000 but I dont know about that

By ysangkok on 2012-10-17 13:44:12 UTC

I used convert logo: -monochrome bigimg.jp2 now. The error in acroread is now "Insufficient data for an image". Evince shows something very similar to this image produced by pdftoppm from Ubuntu Xpdf: http://i.imgur.com/MAzaa.png Note that it's square. pdftoppm from the Xpdf 3.03 binary from the official site produces this: http://i.imgur.com/iXGmi.png
I am sure Poppler understands JPEG-2000 because big coloured images are usually OK. For example, if I remove -monochrome in the command above, everything works.

Thank you for your patience and quick responses.

By josch on 2012-10-18 05:48:44 UTC

oh wait so monochrome doesnt work but rgb does - then we might have a solution here!

I just found a code comment for the jpeg2000 portion of my code: "TODO: read real colorspace"
Right now, it defaults to assign RGB to jepeg2000. This is because python imaging can't read jpeg2000 metadata.

Can you try to open your pdf in a text editor and replace:

/ColorSpace /DeviceRGB

/ColorSpace /DeviceGray

Or remove the line entirely.

And try the same with replacing:

/BitsPerComponent 8

/BitsPerComponent 1

Or remove the line entirely again.

And check if it works?

By ysangkok on 2012-10-18 11:58:23 UTC

It is not just "monochrome works, rgb doesn't". Using DeviceGray fixes it for this image. But the PPM example I linked above is still broken. But that image shows in every other viewer than Acroread, so I think Acroread might be broken. It doesn't even work when the converted with ImageMagick: convert colored.ppm colored.pdf. So that's probably not a Img2pdf problem.

If I use convert logo: -monochrome -depth 8 -type TrueColor bigimg.jp2, it works too. RGBA images are broken too, but that makes sense now.

By josch on 2012-10-18 15:00:43 UTC

So lets first look at the big image.

You say that using /DeviceGray fixes the issue for that one. Unfortunately I didnt figure out how to easily retrieve the colorspace from a jpeg2000 image yet. This will be more effort...

As a temporary solution I added the -C or --colorspace option that you can use to force grayscale colorspace. So you now can do img2pdf -C L in.jp2 -o out.pdf. L is grayscale in PIL terms. The --help lists more possible modes.

But you say that even with /DeviceGray the small image is still broken. Maybe acroread simply doesnt like tiny jpeg2000 images? Are also tiny RGB images broken in acroread?

By ysangkok on 2012-10-18 17:12:57 UTC

I know Acrobat Reader can handle tiny JPEG-2000 images, cause it handles the original monochrome triangle once I patch it to say "DeviceGray" (this trick of course also fixes the monochrome ImageMagick logo). But it doesn't handle the tiny colored image. My last post referenced the colored PNM image linked (not the one in #0, that one's monochrome). That's why it's called colored.pnm.

So yes, tiny RGB image parsing is probably broken in Acrobat Reader, unless Evince parses an invalid picture and shows it as intended. Also, the same image converted directly to PDF with ImageMagick also doesn't show in Acroread.

Acrobat Reader handles big RGB PDF's fine (like the colored ImageMagick logo).

Thanks for the patch.

Here's base64 of colored.jp2 in case you don't have ImageMagick or NetPBM:

AAAADGpQICANCocKAAAAFGZ0eXBqcDIgAAAAAGpwMiAAAAAtanAyaAAAABZpaGRyAAAAAgAAAAMA
AwcHAAAAAAAPY29scgEAAAAAABAAAAAAanAyY/9P/1EALwAAAAAAAwAAAAIAAAAAAAAAAAAAAAMA
AAACAAAAAAAAAAAAAwcBAQcBAQcBAf9kACMAAUNyZWF0b3I6IEphc1BlciBWZXJzaW9uIDEuOTAw
LjH/UgAMAAAAAQEFBAQAAf9cABNAQEhIUEhIUEhIUEhIUEhIUP9dABQBQEBISFBISFBISFBISFBI
SFD/XQAUAkBASEhQSEhQSEhQSEhQSEhQ/5AACgAAAAAAmwAB/10AFABAQAAAAAAAAAAAAEgAAEhI
UP9dABQBQEAAAAAAAAAAAABIAABISFD/XQAUAkBAAAAAAAAAAAAASAAASEhQ/5PB8gEEx9QECdfD
5wIJgICAgICAgICAx9oEBn/PwAQAz8AEB8faBT8ASH2gQAAnBS4TfwJ/x9oCn5gQCQDfz8AKfgBz
8wEGfwr0SgP/2Q==

By josch on 2013-10-23 06:27:45 UTC

With my latest changes your original 11x9 image now works fine in acroread. Unfortunately the pnm example from wikipedia still doesnt work so I investigated a bit more and it indeed seems to be the image size that matters here:

for width in `seq 1 5`; do
    for height in `seq 1 5`; do
        echo -ne "P3\n$width $height\n255\n" > out_${width}_${height}.pnm
        for y in `seq 1 $height`; do
            for x in `seq 1 $width`; do
                echo -ne "255 0 0 "
            done;
            echo -ne "\n"
        done >> out_${width}_${height}.pnm
        convert out_${width}_${height}.pnm out_${width}_${height}.jp2
        ./img2pdf.py out_${width}_${height}.jp2 -o out_${width}_${height}.pdf
    done
done

From the pdfs that above code produces, acroread was only able to show those where width and height both were greater or equal to four pixels.

But since this restriction is rather odd because pdf should not be limited to include images of more than 3 pixels in either dimension, I additionally checked whether it was maybe instead the page size that mattered by just changing the dpi of the one pixel image:

for dpi in `seq 1 72`; do ./img2pdf.py --dpi $dpi out_1_1.jp2 -o out_1_1_$dpi.pdf; done

It turns out that with a low enough dpi value (everything lower than 25) the 1x1 image displays just fine. Further fiddling with manually editing the /MediaBox size yielded that the /MediaBox must be of at least 3.00 in width and height for acroread to display the pdf correctly.

Therefore, the solution to this problem is, to just increase the dpi values for images which are very small so that the resulting pdf page becomes big enough nevertheless.

Does this fix your problem?

By josch on 2017-01-21T07:53:48.711Z

Closing because of no reply from original submitter.

By josch on 2017-01-21T07:53:48.921Z

Status changed to closed

*By josch on 2015-03-15T09:41:53.361Z* *Created by: ysangkok* The following PNM file (```img.pnm```): ``` P1 11 9 00000000000 00000100000 00001110000 00001110000 00011111000 00011111000 00111111100 01111111110 00000000000 ``` converted to JP2 with ```convert img.pnm img.jp2``` or with jasper: ```jasper -f img.pnm -F img.jp2```, converted to PDF with ```~/img2pdf/img2pdf.py img.jp2 -o img.pdf```, opened in Adobe Acrobat Reader, gives me the following error message: "An error exists on this page". **Imported comments:** *By josch on 2012-10-17 04:00:05 UTC* Did you do more tests? 1. Your subject line suggests that this problem only happens with a size of 11x9 - does it happen with bigger images? 2. You only convert your image to jpeg 2000. Does the same error happen when you convert it to normal jpeg? 3. You are testing a monochrome image. Does the same error happen with RGB images? 4. You are only testing with acrobat reader. Does the same error happen with different PDF parsers? *By ysangkok on 2012-10-17 10:02:34 UTC* 1. Yes. It does. I added an extra black line so it became a 11x10 image, and the same problem occured. 1. No, it does not. 1. [Using the PPM example in Wikipedia](http://en.wikipedia.org/wiki/Netpbm_format#PPM_example) converted to JP2, the resulting PDF file is readable, but completely white with wrong dimensions (square) in acroread. In Evince it's correct. The main page README asks for bug reports when Acroread can't read it. 1. No, Evince shows no errors, but the image is corrupt. *By josch on 2012-10-17 13:38:52 UTC* 1. can you take a bigger image? Just pick any jpeg and convert it to jp2 using the convert command. 2. img2pdf does a plain copy of the image data and sets the object properties accordingly. If the jpeg2000 image that was input is of a format that the acrobat reader can't understand then I can't fix the problem. This is only one of the possible explanations though. 3. the README asks for bugreports for acroread because PDF is severely underspecified and the only way to know if you output correct PDF is by testing it with acroread 4. it can very well be that poppler (the evince engine) doesnt understand jpeg2000 but I dont know about that *By ysangkok on 2012-10-17 13:44:12 UTC* 1. I used ```convert logo: -monochrome bigimg.jp2``` now. The error in acroread is now "Insufficient data for an image". Evince shows something very similar to this image produced by ```pdftoppm``` from Ubuntu Xpdf: http://i.imgur.com/MAzaa.png Note that it's square. ```pdftoppm``` from the Xpdf 3.03 binary from the official site produces this: http://i.imgur.com/iXGmi.png 4. I am sure Poppler understands JPEG-2000 because big coloured images are usually OK. For example, if I remove ```-monochrome``` in the command above, everything works. Thank you for your patience and quick responses. *By josch on 2012-10-18 05:48:44 UTC* oh wait so monochrome doesnt work but rgb does - then we might have a solution here! I just found a code comment for the jpeg2000 portion of my code: "TODO: read real colorspace" Right now, it defaults to assign RGB to jepeg2000. This is because python imaging can't read jpeg2000 metadata. Can you try to open your pdf in a text editor and replace: /ColorSpace /DeviceRGB by /ColorSpace /DeviceGray Or remove the line entirely. And try the same with replacing: /BitsPerComponent 8 by /BitsPerComponent 1 Or remove the line entirely again. And check if it works? *By ysangkok on 2012-10-18 11:58:23 UTC* It is not just "monochrome works, rgb doesn't". Using `DeviceGray` fixes it for this image. But the PPM example I linked above is still broken. But that image shows in every other viewer than Acroread, so I think Acroread might be broken. It doesn't even work when the converted with ImageMagick: `convert colored.ppm colored.pdf`. So that's probably not a Img2pdf problem. If I use `convert logo: -monochrome -depth 8 -type TrueColor bigimg.jp2`, it works too. RGBA images are broken too, but that makes sense now. *By josch on 2012-10-18 15:00:43 UTC* So lets first look at the big image. You say that using /DeviceGray fixes the issue for that one. Unfortunately I didnt figure out how to easily retrieve the colorspace from a jpeg2000 image yet. This will be more effort... As a temporary solution I added the -C or --colorspace option that you can use to force grayscale colorspace. So you now can do img2pdf -C L in.jp2 -o out.pdf. L is grayscale in PIL terms. The --help lists more possible modes. But you say that even with /DeviceGray the small image is still broken. Maybe acroread simply doesnt like tiny jpeg2000 images? Are also tiny RGB images broken in acroread? *By ysangkok on 2012-10-18 17:12:57 UTC* I know Acrobat Reader can handle tiny JPEG-2000 images, cause it handles the original monochrome triangle once I patch it to say "DeviceGray" (this trick of course also fixes the monochrome ImageMagick logo). But it doesn't handle the tiny colored image. My last post referenced the colored PNM image linked (not the one in #0, that one's monochrome). That's why it's called `colored.pnm`. So yes, tiny RGB image parsing is probably broken in Acrobat Reader, unless Evince parses an invalid picture and shows it as intended. Also, the same image converted directly to PDF with ImageMagick also doesn't show in Acroread. Acrobat Reader handles big RGB PDF's fine (like the colored ImageMagick logo). Thanks for the patch. Here's base64 of `colored.jp2` in case you don't have ImageMagick or NetPBM: AAAADGpQICANCocKAAAAFGZ0eXBqcDIgAAAAAGpwMiAAAAAtanAyaAAAABZpaGRyAAAAAgAAAAMA AwcHAAAAAAAPY29scgEAAAAAABAAAAAAanAyY/9P/1EALwAAAAAAAwAAAAIAAAAAAAAAAAAAAAMA AAACAAAAAAAAAAAAAwcBAQcBAQcBAf9kACMAAUNyZWF0b3I6IEphc1BlciBWZXJzaW9uIDEuOTAw LjH/UgAMAAAAAQEFBAQAAf9cABNAQEhIUEhIUEhIUEhIUEhIUP9dABQBQEBISFBISFBISFBISFBI SFD/XQAUAkBASEhQSEhQSEhQSEhQSEhQ/5AACgAAAAAAmwAB/10AFABAQAAAAAAAAAAAAEgAAEhI UP9dABQBQEAAAAAAAAAAAABIAABISFD/XQAUAkBAAAAAAAAAAAAASAAASEhQ/5PB8gEEx9QECdfD 5wIJgICAgICAgICAx9oEBn/PwAQAz8AEB8faBT8ASH2gQAAnBS4TfwJ/x9oCn5gQCQDfz8AKfgBz 8wEGfwr0SgP/2Q== *By josch on 2013-10-23 06:27:45 UTC* With my latest changes your original 11x9 image now works fine in acroread. Unfortunately the pnm example from wikipedia still doesnt work so I investigated a bit more and it indeed seems to be the image size that matters here: for width in `seq 1 5`; do for height in `seq 1 5`; do echo -ne "P3\n$width $height\n255\n" > out_${width}_${height}.pnm for y in `seq 1 $height`; do for x in `seq 1 $width`; do echo -ne "255 0 0 " done; echo -ne "\n" done >> out_${width}_${height}.pnm convert out_${width}_${height}.pnm out_${width}_${height}.jp2 ./img2pdf.py out_${width}_${height}.jp2 -o out_${width}_${height}.pdf done done From the pdfs that above code produces, acroread was only able to show those where width and height both were greater or equal to four pixels. But since this restriction is rather odd because pdf should not be limited to include images of more than 3 pixels in either dimension, I additionally checked whether it was maybe instead the page size that mattered by just changing the dpi of the one pixel image: for dpi in `seq 1 72`; do ./img2pdf.py --dpi $dpi out_1_1.jp2 -o out_1_1_$dpi.pdf; done It turns out that with a low enough dpi value (everything lower than 25) the 1x1 image displays just fine. Further fiddling with manually editing the /MediaBox size yielded that the /MediaBox must be of at least 3.00 in width and height for acroread to display the pdf correctly. Therefore, the solution to this problem is, to just increase the dpi values for images which are very small so that the resulting pdf page becomes big enough nevertheless. Does this fix your problem? --- *By josch on 2017-01-21T07:53:48.711Z* --- Closing because of no reply from original submitter. --- *By josch on 2017-01-21T07:53:48.921Z* --- Status changed to closed