large jp2000 file doesn't convert #18

New issue

Closed

opened 2021-04-25 19:57:38 +00:00 by josch · 0 comments

josch commented

2021-04-25 19:57:38 +00:00

Owner

By josch on 2015-03-15T09:41:52.904Z

Created by: marc-p

Hello,

I've create both a jp2 and j2k file of a large image (36x48") in a COTS program (geosoft oasis montaj). I'm using img2pdf in a 64 bit Windows environment, so I installed Pillow and updated your script to say "From PIL import Image" It runs, but creates a large (26x200"), apparently empty image. I've tried Adobe Reader X, and Nitro Reader 3.

I am able to import the image back into geosoft, and view it in another GIS application (FME Universal viewer), but not in Corel PhotoPaint X5. Gimp 2.86 can view the images but they come up greyscale for some reason.

The same image as a png works beautifully.

I'm unable to attach the file here, as its over 30 MB. Can I yousendit to you?

Imported comments:

By josch on 2013-08-29 19:37:24 UTC

img2pdf just dumps the content of your jpeg into the pdf. Possible sources of error are:

the pdf readers are unable to read the kind of jp2000 you are using in which case you can't do anything about it short off converting it lossy to some other format
img2pdf sets width/height/color/dpi options wrongly

Are you python savvy enough to have a look at what values img2pdf figures out for width/height/color/dpi? Otherwise I can quickly add a --verbose option which will tell you that.

If you want me to try the file, just upload it somewhere. I would not like to have a 30MB file in my email inbox.

By josch on 2013-08-30 09:05:51 UTC

Hi,
I discovered a problem. Your jpeg2000 file had the width/height at a different position in the file (4 byte offset) than my other jpeg2000 test files. This is why width and height were read wrongly. I think I found the value which specifies that offset. Now your file gets read as being of 11929 x 7145 pixels. This looks not too wrong.
Now your jpeg2000 gets copied correctly into the pdf and the sizes are set correctly. If acrobat still can't read it, then it is because your jpeg2000 files are "weird".

By bitsgalore on 2013-08-30 11:22:53 UTC

Hi,

Just saw this discussion. If this is about "weird" jpeg 2000 files, you might want to check out this, which is a validator tool for JPEG 2000 Part 1 (aka JP2) that is able to detect all sorts of JPEG 2000 weirdness:

http://www.openplanetsfoundation.org/software/jpylyzer

(Incidentally I'm the main author of that tool.) Don't know if this is of any help, but just thought I'd mention it.

Cheers,

Johan

-----Original Message-----
From: josch [mailto:notifications@github.com]
Sent: Fri 30/08/2013 11:05
To: josch/img2pdf
Subject: Re: [img2pdf] large jp2000 file doesn't convert (#4)

Reply to this email directly or view it on GitHub:
https://github.com/josch/img2pdf/issues/4#issuecomment-23548918

By josch on 2013-08-31 07:07:47 UTC

Hi Johan,

Quoting Johan van der Knijff (2013-08-30 13:22:54)

Just saw this discussion. If this is about "weird" jpeg 2000 files, you might
want to check out this, which is a validator tool for JPEG 2000 Part 1 (aka
JP2) that is able to detect all sorts of JPEG 2000 weirdness:

http://www.openplanetsfoundation.org/software/jpylyzer

Thanks a lot for that hint! Now, by reading the jpylyzer code I can even figure
out how jpeg2000 actually is supposed to work! Somehow I was not able to find
any documentation for the jpeg2000 file format online. Even finding out at what
offset it saves width and height seemed impossible so I ended up figuring it
out by looking at hex dumps.

The only thing which I found weird about jpylyzer was, that its output is XML.
First I thought that I was using it wrongly because the only thing it output
was a big xml blob. Only when I looked deeper and ran the XML through a
prettifier I figured out, that XML was actually the intended output of
jpylyzer. Maybe you should write somewhere in the beginning that the default
output of jpylyzer is in XML format? Using other terminal applications on a
regular basis it was quite unexpected that it was using XML output. Which is
also why sentences in the readme like "In the above example, output is
redirected to the file 'rubbish.xml'." only confused me because I was asking
myself: "why would I want to save the output to an XML file???". Maybe you can
mention this fact some place in the beginning of the README or docs? The user
manual only states this on page 20.

Thanks a lot for this tool - now I can actually learn how jpeg2000 works. No
idea why I was not able to find any actual documentation on it.

PS: the jp2 file by marc-p seemed to be valid according to jpylyzer :)

cheers, josch

By bitsgalore on 2013-08-31 13:36:57 UTC

Hi Josch,

Actually the filespec for JP2 is here (downloadable for free):

http://www.jpeg.org/public/15444-1annexi.pdf

However that doesn't include the image codestream syntax. The spec for that is behind a paywall, but there is a free (though partially outdated) Final Committee Draft that'll give you the general idea:

http://www.jpeg.org/public/fcd15444-1.pdf

Another useful link, just in case you're interested in any of the other JPEG 2000 formats:

http://fileformats.archiveteam.org/wiki/JPEG2000

As for your comments regarding XML output: yes, I might emphasize that a bit at the top of the readme. From the outset jpylyzer was really designed to be used as a component in automated workflows, and for that XML is much easier to use/process than human readable text. Also, originally jpylyzer's output was pretty-printed, but under certain circumstances that would result in weird Unicode errors under Python 3.x. Some recent improvements of the code should have fixed that, so if i have a bit of time I'll see if I can re-introduce pretty printing in an upcoming version. Meanwhile I'd suggest to use a dedicated XML viewer/editor or even a web browser for inspecting jpylyzer's output, as I fully agree it does look pretty terrible in a text editor!

Cheers,

Johan

By josch on 2013-10-21 13:57:54 UTC

I just added a tiny new parsing module based on jpylyzer to read jpeg2000 files more properly than it was done so far. With this change, the colorspace should now also be correctly detected.

*By josch on 2015-03-15T09:41:52.904Z* *Created by: marc-p* Hello, I've create both a jp2 and j2k file of a large image (36x48") in a COTS program (geosoft oasis montaj). I'm using img2pdf in a 64 bit Windows environment, so I installed Pillow and updated your script to say "From PIL import Image" It runs, but creates a large (26x200"), apparently empty image. I've tried Adobe Reader X, and Nitro Reader 3. I am able to import the image back into geosoft, and view it in another GIS application (FME Universal viewer), but not in Corel PhotoPaint X5. Gimp 2.86 can view the images but they come up greyscale for some reason. The same image as a png works beautifully. I'm unable to attach the file here, as its over 30 MB. Can I yousendit to you? **Imported comments:** *By josch on 2013-08-29 19:37:24 UTC* img2pdf just dumps the content of your jpeg into the pdf. Possible sources of error are: - the pdf readers are unable to read the kind of jp2000 you are using in which case you can't do anything about it short off converting it lossy to some other format - img2pdf sets width/height/color/dpi options wrongly Are you python savvy enough to have a look at what values img2pdf figures out for width/height/color/dpi? Otherwise I can quickly add a --verbose option which will tell you that. If you want me to try the file, just upload it somewhere. I would not like to have a 30MB file in my email inbox. *By josch on 2013-08-30 09:05:51 UTC* Hi, I discovered a problem. Your jpeg2000 file had the width/height at a different position in the file (4 byte offset) than my other jpeg2000 test files. This is why width and height were read wrongly. I think I found the value which specifies that offset. Now your file gets read as being of 11929 x 7145 pixels. This looks not too wrong. Now your jpeg2000 gets copied correctly into the pdf and the sizes are set correctly. If acrobat still can't read it, then it is because your jpeg2000 files are "weird". *By bitsgalore on 2013-08-30 11:22:53 UTC* Hi, Just saw this discussion. If this is about "weird" jpeg 2000 files, you might want to check out this, which is a validator tool for JPEG 2000 Part 1 (aka JP2) that is able to detect all sorts of JPEG 2000 weirdness: http://www.openplanetsfoundation.org/software/jpylyzer (Incidentally I'm the main author of that tool.) Don't know if this is of any help, but just thought I'd mention it. Cheers, Johan -----Original Message----- From: josch [mailto:notifications@github.com] Sent: Fri 30/08/2013 11:05 To: josch/img2pdf Subject: Re: [img2pdf] large jp2000 file doesn't convert (#4) Hi, I discovered a problem. Your jpeg2000 file had the width/height at a different position in the file (4 byte offset) than my other jpeg2000 test files. This is why width and height were read wrongly. I think I found the value which specifies that offset. Now your file gets read as being of 11929 x 7145 pixels. This looks not too wrong. Now your jpeg2000 gets copied correctly into the pdf and the sizes are set correctly. If acrobat still can't read it, then it is because your jpeg2000 files are "weird". --- Reply to this email directly or view it on GitHub: https://github.com/josch/img2pdf/issues/4#issuecomment-23548918 *By josch on 2013-08-31 07:07:47 UTC* Hi Johan, Quoting Johan van der Knijff (2013-08-30 13:22:54) > Just saw this discussion. If this is about "weird" jpeg 2000 files, you might > want to check out this, which is a validator tool for JPEG 2000 Part 1 (aka > JP2) that is able to detect all sorts of JPEG 2000 weirdness: > > http://www.openplanetsfoundation.org/software/jpylyzer Thanks a lot for that hint! Now, by reading the jpylyzer code I can even figure out how jpeg2000 actually is supposed to work! Somehow I was not able to find any documentation for the jpeg2000 file format online. Even finding out at what offset it saves width and height seemed impossible so I ended up figuring it out by looking at hex dumps. The only thing which I found weird about jpylyzer was, that its output is XML. First I thought that I was using it wrongly because the only thing it output was a big xml blob. Only when I looked deeper and ran the XML through a prettifier I figured out, that XML was actually the intended output of jpylyzer. Maybe you should write somewhere in the beginning that the default output of jpylyzer is in XML format? Using other terminal applications on a regular basis it was quite unexpected that it was using XML output. Which is also why sentences in the readme like "In the above example, output is redirected to the file 'rubbish.xml'." only confused me because I was asking myself: "why would I want to save the output to an XML file???". Maybe you can mention this fact some place in the beginning of the README or docs? The user manual only states this on page 20. Thanks a lot for this tool - now I can actually learn how jpeg2000 works. No idea why I was not able to find any actual documentation on it. PS: the jp2 file by marc-p seemed to be valid according to jpylyzer :) cheers, josch *By bitsgalore on 2013-08-31 13:36:57 UTC* Hi Josch, Actually the filespec for JP2 is here (downloadable for free): http://www.jpeg.org/public/15444-1annexi.pdf However that doesn't include the image codestream syntax. The spec for that is behind a paywall, but there is a free (though partially outdated) Final Committee Draft that'll give you the general idea: http://www.jpeg.org/public/fcd15444-1.pdf Another useful link, just in case you're interested in any of the other JPEG 2000 formats: http://fileformats.archiveteam.org/wiki/JPEG2000 As for your comments regarding XML output: yes, I might emphasize that a bit at the top of the readme. From the outset jpylyzer was really designed to be used as a component in automated workflows, and for that XML is much easier to use/process than human readable text. Also, originally jpylyzer's output was pretty-printed, but under certain circumstances that would result in weird Unicode errors under Python 3.x. Some recent improvements of the code should have fixed that, so if i have a bit of time I'll see if I can re-introduce pretty printing in an upcoming version. Meanwhile I'd suggest to use a dedicated XML viewer/editor or even a web browser for inspecting jpylyzer's output, as I fully agree it does look pretty terrible in a text editor! Cheers, Johan *By josch on 2013-10-21 13:57:54 UTC* I just added a tiny new parsing module based on jpylyzer to read jpeg2000 files more properly than it was done so far. With this change, the colorspace should now also be correctly detected.