"pdf width or height is below 3.00 - too small for some viewers!" from JPEG2000 file #186

Closed
opened 5 months ago by KenD · 5 comments
KenD commented 5 months ago

I have some scanned images that I've converted from TIFF to JPEG2000 using the Kakadu kdu_compress command. The files behave fine in every viewer, and exiftool shows:

% exiftool jpeg2000test.jp2
ExifTool Version Number : 12.60
File Name : jpeg2000test.jp2
Directory : .
File Size : 2.2 MB
File Modification Date/Time : 2023:12:04 16:06:39+00:00
File Access Date/Time : 2023:12:04 16:06:42+00:00
File Inode Change Date/Time : 2023:12:04 16:06:39+00:00
File Permissions : -rw-r--r--
File Type : JP2
File Type Extension : jp2
MIME Type : image/jp2
Major Brand : JPEG 2000 Image (.JP2)
Minor Version : 0.0.0
Compatible Brands : jp2
Image Height : 6976
Image Width : 5103
Number Of Components : 3
Bits Per Component : 8 Bits, Unsigned
Compression : JPEG 2000
Color Spec Method : Enumerated
Color Spec Precedence : 0
Color Spec Approximation : Not Specified
Color Space : sRGB
Capture Y Resolution : 59055
Capture X Resolution : 59055
Capture Y Resolution Unit : 10 m
Capture X Resolution Unit : 10 m
Image Size : 5103x6976
Megapixels : 35.6

However, trying to convert that file (and any others I've tried), gives:

% img2pdf jpeg2000test.jp2
pdf width or height is below 3.00 - too small for some viewers!
error: Page size must be between 3 and 14400 PDF units

If it'll help, the above test file is here .

I have some scanned images that I've converted from TIFF to JPEG2000 using the Kakadu `kdu_compress` command. The files behave fine in every viewer, and `exiftool` shows: > % exiftool jpeg2000test.jp2 > ExifTool Version Number : 12.60 > File Name : jpeg2000test.jp2 > Directory : . > File Size : 2.2 MB > File Modification Date/Time : 2023:12:04 16:06:39+00:00 > File Access Date/Time : 2023:12:04 16:06:42+00:00 > File Inode Change Date/Time : 2023:12:04 16:06:39+00:00 > File Permissions : -rw-r--r-- > File Type : JP2 > File Type Extension : jp2 > MIME Type : image/jp2 > Major Brand : JPEG 2000 Image (.JP2) > Minor Version : 0.0.0 > Compatible Brands : jp2 > Image Height : 6976 > Image Width : 5103 > Number Of Components : 3 > Bits Per Component : 8 Bits, Unsigned > Compression : JPEG 2000 > Color Spec Method : Enumerated > Color Spec Precedence : 0 > Color Spec Approximation : Not Specified > Color Space : sRGB > Capture Y Resolution : 59055 > Capture X Resolution : 59055 > Capture Y Resolution Unit : 10 m > Capture X Resolution Unit : 10 m > Image Size : 5103x6976 > Megapixels : 35.6 However, trying to convert that file (and any others I've tried), gives: > % img2pdf jpeg2000test.jp2 > pdf width or height is below 3.00 - too small for some viewers! > error: Page size must be between 3 and 14400 PDF units If it'll help, the above test file is [here](https://mega.nz/file/nU8XDAoI#KXAOpPDDCidYhvKjiBkkRf1KWCX708Swpy08vxw68-o) .
josch commented 5 months ago
Owner

Yes, that seems correct. Where do you see the bug? What would you expect to happen instead?

Yes, that seems correct. Where do you see the bug? What would you expect to happen instead?
KenD commented 5 months ago
Poster

Well ... image to PDF :) img2pdf doesn't like those files: however, if I take the .jp2 file into another application (GraphicConverter on MacOS), and re-save it, img2pdf processes that correctly without any issues. So there's something about the source image that it doesn't like.

Comparing the before (non-working) and after (working) files, the only difference is:

Before:
Capture Y Resolution Unit : 10 m
Capture X Resolution Unit : 10 m

After:
Capture Y Resolution Unit : 0.1 mm
Capture X Resolution Unit : 0.1 mm

10m does seem very strange, and I've no idea why kdu_compress is settling those values.

However, now that I've spotted that, I've come up with a workaround: remove the problem metadata using exiv2:

% exiv2 rm jpeg2000test.jp2

and so

img2pdf jpeg2000test.jp2 -o test.pdf

works fine.

Apologies for the "false positive": img2pdf was never doing anything wrong, just picking up some crazy values that were otherwise non-obvious to me.

Well ... image to PDF :) img2pdf doesn't like those files: however, if I take the .jp2 file into another application (GraphicConverter on MacOS), and re-save it, img2pdf processes that correctly without any issues. So there's something about the source image that it doesn't like. Comparing the before (non-working) and after (working) files, the only difference is: Before: Capture Y Resolution Unit : 10 m Capture X Resolution Unit : 10 m After: Capture Y Resolution Unit : 0.1 mm Capture X Resolution Unit : 0.1 mm 10m does seem very strange, and I've no idea why `kdu_compress` is settling those values. However, now that I've spotted that, I've come up with a workaround: remove the problem metadata using [exiv2](https://exiv2.org/): ``` % exiv2 rm jpeg2000test.jp2 ``` and so ``` img2pdf jpeg2000test.jp2 -o test.pdf ``` works fine. Apologies for the "false positive": img2pdf was never doing anything wrong, just picking up some crazy values that were otherwise non-obvious to me.
KenD closed this issue 5 months ago
josch commented 5 months ago
Owner

Do you see anything img2pdf could do better to inform you of what is going on?

Do you see anything img2pdf could do better to inform you of what is going on?
KenD commented 5 months ago
Poster

Given that stripping out the metadata seems to fix the issue, I'm assuming that img2pdf is reading some of the data, doing a calculation and deciding it's not safe/sensible to proceed. Again, the images look fine on all the viewers I've tried, so it's must just be a metadata issue.

I'm not sure it would be practical (or even possible) for an option within img2pdf to ignore, or override, the metadata values? This may just been an extreme edge case with my images, so whether it brings any real value to anyone else is questionable.

Thanks for a great tool!

Given that stripping out the metadata seems to fix the issue, I'm assuming that img2pdf is reading some of the data, doing a calculation and deciding it's not safe/sensible to proceed. Again, the images look fine on all the viewers I've tried, so it's must just be a metadata issue. I'm not sure it would be practical (or even possible) for an option within img2pdf to ignore, or override, the metadata values? This may just been an extreme edge case with my images, so whether it brings any real value to anyone else is questionable. Thanks for a great tool!
josch commented 5 months ago
Owner

So the error you see doesn't actually come from img2pdf but from the library it uses to create the pdf (pikepdf). That library refuses to create a PDF page which is so small that it violates the PDF spec. In this case (and you can try this yourself by forcing img2pdf to use its internal engine which doesn't care about page sizes using --engine=internal) the pdf page would have a width and a height of zero.

How are these sizes computed? Indeed the dpi embedded into the image is used to convert from pixels to an actual physical length. So maybe what img2pdf could do is to tell the user that the impossible page size is due to the combination of dpi and image size and that those can be overridden using the -s or --imgsize options.

For example in your case, you can create a valid pdf using this:

img2pdf.py --imgsize 600dpi jpeg2000test.jp2 -o out.pdf

Which will compute the page size using the custom dpi value of 600. You can also specify a custom size (DIN A4 in this case) like this:

img2pdf.py --imgsize a4 ~/Downloads/jpeg2000test.jp2 -o out.pdf

Maybe part of the above should be in the error message?

So the error you see doesn't actually come from img2pdf but from the library it uses to create the pdf (pikepdf). That library refuses to create a PDF page which is so small that it violates the PDF spec. In this case (and you can try this yourself by forcing img2pdf to use its internal engine which doesn't care about page sizes using `--engine=internal`) the pdf page would have a width and a height of zero. How are these sizes computed? Indeed the dpi embedded into the image is used to convert from pixels to an actual physical length. So maybe what img2pdf could do is to tell the user that the impossible page size is due to the combination of dpi and image size and that those can be overridden using the `-s` or `--imgsize` options. For example in your case, you can create a valid pdf using this: img2pdf.py --imgsize 600dpi jpeg2000test.jp2 -o out.pdf Which will compute the page size using the custom dpi value of 600. You can also specify a custom size (DIN A4 in this case) like this: img2pdf.py --imgsize a4 ~/Downloads/jpeg2000test.jp2 -o out.pdf Maybe part of the above should be in the error message?
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: josch/img2pdf#186
Loading…
There is no content yet.