alpha channel and jp2 #173
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hi, I am using img2pdf with jp2 images. My input is not intended to have alpha channel, but sometimes one slips through that does.
If I try
img2pdf
(0.4.4) on it, I get an error "This function must not be called on images with alpha"If I understand right, img2pdf can now handle images with alpha for other formats, just not jp2? I'm not sure if it does so by preserving the alpha, or flattening it, or what.
Is there any chance it could, even with an opt-in argument, do something for jp2 with alpha, even if that means flattening over a white bg? If not actualy do the same thing it does for other formats (are there limits in OpenJPEG or whatever other library is used for jp2? Are there issues that could be filed against those libraries?)
And, otherwise, would it be helpful to change this error message to be clear that it only applies to jp2, it's not universally true that img2pdf can't handle alpha?
This should not happen. Can you provide an example image so that I can reproduce your problem?
Ooh, that's way better response than I hoped, I thought I was seeing expected behavior! I'm glad I filed it, I almost didn't file it, worried I was being annoying! Thank you so much for your quick response!
Yes, I easily can give you an example, thank you!
I'm not sure if I can attach a JPEG2000 here in this ticket without it being altered, but I'll try that first, attached.
This jp2 was created from a TIFF source (that had an alpha), using the vips package, with a command like:
Yes, after downloading the attachment above, it does still reproduce for me.
This patch should fix your problem. Can you confirm?
How about
instead of
as a style improvement?
Thank you!
I'm not much of a pythonista, nor am I used to dealing with patch files, it may take me a bit of time to figure out how to apply the patch locally.
I'm on leave tomorrow, but I'll look at this next week and do my best to figure out how to test it (hints welcome).
Very much appreciate the quick work! This will be super helpful if it allows the PDF to be generated from these JP2's!
@mara0004 replacing
color != Colorspace.RGBA and color != Colorspace.LA
withcolor not in (Colorspace.RGBA, Colorspace.LA)
is a good improvement. The other though is incorrect as no smask will be generated for a JPEG2000.After cloning the git, you can run this to apply the patch locally:
Then you can run img2pdf directly from your clone by executing
I'm sorry, I'm not trying to get basic python getting started help from you, but when I try running the patched img2pdf directly from clone, I get:
It's not finding my existing pip dependencies I guess. (I am on MacOS, which doesn't help).
I will work on it more next week -- but also if you were able to reproduce my problem with that file, and the patch resolves it for you with that file, it does seem to me too safe to determine you have resolved it.
I appreciate it!
Yes, it works for me with the file you posted.
Actually, I figured it out.
And have verified it produced output pdf that looks right, without error!
Awesome!
Thank you!
On what schedule would you anticipate a release including this fix?
Depends. If possible I'd like to hear from @monobot on #159 and from @gms on #152.
I'm also having trouble writing a test case for jpeg2000 images with transparency because of https://stackoverflow.com/questions/76833368/how-to-use-imagemagick-compare-so-that-the-order-does-not-matter
I can make a new release without #159 and #152 fixed but I need to create a unit test for transparent jpeg2000 images. If you have any idea for a solution I'm all ears.
Yeah, sorry. I should have looked more thoroughly before commenting.
Came to my mind too during the night, for if we split an alpha channel off a JPX image, we would lose the JPX encoding, which would increase file size and make it impossible to reconstruct the original image data.
I was on the wrong track because, for the other formats, transparency is only supported through masks. But great if PDF supports transparency natively for JPX.
No worries! Better a few comments that have issues than not giving input at all. I like it when people would tell me about small improvements like this. The code is getting very old and while working on this issue I discovered even more problematic cases which work in practice but are wrong anyways:
Hi, I've replied on #159.
@josch Thank you. May I ask the next possibly silly question: Why do you consider
is
wrong and==
correct? Is it just about consistency?I think the official enum howto actually recommends comparing to enum attributes by identity, and PEP 8 also says singletons should be compared by identity, not by value (some say it is more idiomatic).
I personally don't like
==
for enums because it suggests(myenum.ATTR == myenum.ATTR.value) is True
, but in fact it isFalse
("Comparisons against non-enumeration values will always compare not equal").Aha, so past-me was smart and knew how to correctly compare enums and present-me forgot about it. Thanks! I'll adjust the code such that enums are always compared by identity.
Today we learned that silly messages can be very useful. :)
@mara0004 things are a bit more complicated concerning the enum comparison. There are good reasons to use the
is
operator instead of==
. But above you also suggested to use:But the
in
operator does not check for identity but value, so it uses the slightly wrong comparison for enums. So either this should be re-written to say:or, for longer lists:
This is suboptimal. :(
EDIT Furthermore, I'd also like to switch a few pieces of code to using match/case from PEP636 but when comparing enums, it uses
==
instead ofis
(you can check this by overridingEnum.__eq__
). So even if I switch all cases where I compare two enums against each other from==
tois
, there will always remain cases where enums are compared using==
-- either because of thein
operator or (in the future) because of match/case. So I create a situation in which comparison is inconsistent. This bugs me. Either I always use one or always the other but there is no benefit of mixing both.Hmm, yes, that is correct,
in
andmatch/case
will use==
, but I supposeEnum.__eq__
just maps to an identity comparison. With that implementation, both patterns are exchangeable, it is only a question of style. (To guarantee this on the caller side, one could also subclass Enum and override__eq__
.)Personally, I don't think it a problem to use
is
for explicit comparisons while implicit comparisons go through==
. The effect is the same, and in visible codeis
feels a bit more direct/idiomatic. Apparently an opinion shared by enum/pep8 officials.However, I do see your point. It's two distinct code paths in principle, and as such could hypothetically behave differently. If this matters to you, consistently sticking with
==
is fine. That's a very reasonable argument.BTW, a nit suggestion on commit acc25a4:
can be simplified to
I forgot which python version added this. But it's older than
match/case
, if you intend to require python >= 3.10 anyway, then it should be fine.EDIT: On the other hand, if returned parameters change, this can potentially capture a different variable than desired. So I'm not sure if it's advisable, although it looks nicer at first glance.
Indeed I forgot about that syntax. It looks funny but it's the same mechanism as when writing
And
*args
will catch any additional positional arguments. The underscore is just a placeholder variable name.I tried this in Python 3.4 from 2014 and it works fine so nothing new about this syntax. I'll use it right away.
@jrochkind you asked for an estimation for a release including this fix. It turns out that the unit test I wrote for JPEG2000 with alpha channel fails on Fedora. So I cannot release until I figure out what is going on there...
Thanks @josch! I didn't mean to pressure or anything, just plain curious about your plans, since I'm unfamiliar with this project's release practices!
I am sorry the tests are giving you trouble. I wanted to find some time this week to try to contribute to tests, but I'm not sure I'm going to have much to offer on mysterious fedora-specific errors.
I appreciate your work on this tool, and your quick response to to this report, it is all very generous of you, this is an amazing tool. I work in non-profit academic cultural heritage area.
You are not giving me trouble -- quite the opposite! Thanks to you, img2pdf now supports JPEG2000 with transparency. 😄
If your organization is fine with that, maybe I can add it to a new section in the README listing some notable parties that use img2pdf? I've already had contact with libraries or universities that use img2pdf for things like turning scans into PDF for long-term archival storage of historical documents. It's pretty awesome to hear about these things! ❤️
Thanks @josch!
We are not live yet using img2pdf. But plan to use img2pdf to help power our PDF downloads at https://digital.sciencehistory.org.
I am trying to use an unreleased git copy of img2pdf to get the alpha fix temporarily... which seems to work mostly... but is oddly failing only on our Github Actions CI server, with weird error
no width in jp2 header
for some of our test images. But I can't reproduce that error using the same commit from unreleased main branch anywhere else but Github Actions, it does not really seem to be a problem with unreleasedmain
vs 0.4.4... very strange.Rather than try to debug a very mysterious problem that may just be an artifact of temporarily trying to use an unreleased git commit in
requirements.txt
, I may just wait until a release.Hrm... I had to touch the code around the "no width in jp2 header" error message when I implemented support for your images because not only did your image have transparency, it also was 16 bits per pixel instead of the usual 8. If you are able to share the image that gives you trouble I can also take a look.
Oh wow, I actually hadn't noticed that original was 16 bits per pixel either. That is also not normally what we intend.
But when you have 100k plus digitizations of historical material done by different staff over a decade, weird things can pile up, mistakes can be made. There are always a few images in our collection of 100K that are weird mistakes on some dimension.
The current image that is giving me trouble is just a simple TEST image.
OK, I have reproduced it locally, this may indeed be a new bug. Let me know if you want me to file a separate issue?
This is a WEIRD file, since it's part of our test suite, it's not a real file. (it was created dynamically in our test suite, from a miniaturized TIFF version of a real file).
Running from main branch at
2f736d78
Not sure if there's really something wrong with that jp2? I don't know if our code for deriving it from a TIFF could be bad, or the original test suite TIFF could be bad, very curious what you have to say about this!
It turns out, that this is a file without the jp2 metadata around it but just a raw jpeg2000 bitstream. I added rudimentary support for those in
09064e8e70
. Tell me if this doesn't work for you.wow, thanks for the quick diagnosis -- and fix!
That is interesting, because it was definitely not intentional. (I don't know if I have a requirement for handling raw jpeg2000 streams, I didn't mean to provide one!)
That file was produced with
vips
, using a process that I think ordinarily creates full jp2 filesI want to report it to vips, can you suggest any method I can easily use to determine if a given file is a full jp2 or raw jpeg2000, to make my report as accurate as possible?
file
command on my computer, it saysJPEG 2000 Part 1 (JP2)
for full jp2, andJPEG 2000 codestream
for raw JPEG2000, at least for this example.It's interesting that file was handled properly in 0.4.4 -- by accident I guess?
Confirmed that for me too it is handled properly at
09064e8e70
I don't think there is anything to report. As far as I understood it, the raw jpeg2000 bytestream is a valid file. It's just missing the container but i have some image viewers installed that display such a file just fine so I guess it's okay?
The way to detect which is which is to look at the first four bytes of the file:
https://gitlab.mister-muffin.de/josch/img2pdf/src/branch/main/src/jp2.py#L135
Yes, the commit that broke it was
acc25a4926
so that one where I implemented support for transparent jpeg2000. The reason is that this commit also supports bit depth other than 8 and thus tries to parse the JPEG2000 file. This parsing did not happen before as the code assumed that all jpeg2000 files would be 8 bit.Thank you, makes sense!
I was using the
vips
tool intending to convert a TIFF to a .JP2, and the invocation I used normally outputs a .JP2, but in this case output a raw JPEG2000 stream -- at least that's what it looks like to me now, have to investigate more, that part is not an issue for img2pdf though!