Add support for JBIG2 (generic coding) #184

Open
ooBJ3u wants to merge 3 commits from ooBJ3u/img2pdf:main into main

Implements the proposal detailed at #112.

This is a limited implementation of JBIG2, which can be extended to support multiple pages, symbol tables, and other features of the format in the future.

To test, I included a test fixture. You can also download 042.bmp (the same one as @josch already downloaded in #112 from https://git.ghostscript.com/?p=tests.git;a=blob_plain;f=jbig2/042.bmp;hb=HEAD and run the following command:

jbig2 042.bmp | img2pdf > 042.pdf

This results in a small PDF, just as @josch originally found in the comment mentioned above.

This is my first contribution to this repository so let me know if something else is needed. Thanks for a great library!

Implements the proposal detailed at https://gitlab.mister-muffin.de/josch/img2pdf/issues/112#issuecomment-1304. This is a limited implementation of JBIG2, which can be extended to support multiple pages, symbol tables, and other features of the format in the future. To test, I included a test fixture. You can also download 042.bmp (the same one as @josch already downloaded in https://gitlab.mister-muffin.de/josch/img2pdf/issues/112#issuecomment-307 from https://git.ghostscript.com/?p=tests.git;a=blob_plain;f=jbig2/042.bmp;hb=HEAD and run the following command: ```sh jbig2 042.bmp | img2pdf > 042.pdf ``` This results in a small PDF, just as @josch originally found in the comment mentioned above. This is my first contribution to this repository so let me know if something else is needed. Thanks for a great library!
ooBJ3u added 1 commit 7 months ago
154a61a88f Add support for JBIG2 (generic coding)
Implements the proposal detailed at
#112

This is a limited implementation of JBIG2, which can be extended to
support multiple pages, symbol tables, and other features of the format
in the future.

To test, I included a test fixture. You can also download 042.bmp (the same
one as @josch already downloaded in #112
from https://git.ghostscript.com/?p=tests.git;a=blob_plain;f=jbig2/042.bmp;hb=HEAD
and run the following command:

  jbig2 042.bmp | img2pdf > 042.pdf

This results in a small PDF, just as @josch originally found in the
comment mentioned above.

This is my first contribution to this repository so let me know if
something else is needed. Thanks for a great library!
ooBJ3u force-pushed main from 154a61a88f to ee42963164 7 months ago
josch commented 7 months ago
Owner

Wow, thank you! I read through your diff without trying it out yet and it looks really, really good!

My biggest gripe right now is src/tests/input/042.jb2. Why did you use the scan of a page instead the "TEST" image used in the other test cases? One problem with using a "real" test image like you chose in form of the scan of a page is the copyright situation. Even if that page is available in the public domain (is it?) you have to now write that down and keep track of this somewhere.

Wow, thank you! I read through your diff without trying it out yet and it looks really, really good! My biggest gripe right now is src/tests/input/042.jb2. Why did you use the scan of a page instead the "TEST" image used in the other test cases? One problem with using a "real" test image like you chose in form of the scan of a page is the copyright situation. Even if that page is available in the public domain (is it?) you have to now write that down and keep track of this somewhere.
Poster

No problem, I'll swap it out.

No problem, I'll swap it out.
ooBJ3u added 1 commit 7 months ago
2c00f3b66b Use mono.jb2 for tests
This also uncovered a bug in jbig2enc where it uses the wrong unit
for resolution.
ooBJ3u force-pushed main from 2c00f3b66b to b23d82c45e 7 months ago
Poster

Fixed in 085dd192f6.

Fixed in 085dd192f6e14fe0d6384dc661e1e38794bb1507.
ooBJ3u reviewed 7 months ago
src/img2pdf.py Outdated
@ -1820,7 +1842,41 @@ def read_images(
if rawdata[:12] == b"\x00\x00\x00\x0C\x6A\x50\x20\x20\x0D\x0A\x87\x0A":
# image is jpeg2000
imgformat = ImageFormat.JPEG2000
if rawdata[:14].lower() == b"id=imagemagick":
Poster

I wasn't sure why this was if instead of elif. Won't that make it so JPEG2000 still crashes? I fixed it but wanted to double-check.

I wasn't sure why this was `if` instead of `elif`. Won't that make it so JPEG2000 still crashes? I fixed it but wanted to double-check.
mara0004 reviewed 7 months ago
README.md Outdated
@ -33,12 +33,14 @@ input file format and image color space.
| JPEG2000 | any | direct |
| PNG (non-interlaced, no transparency) | any | direct |
| TIFF (CCITT Group 4) | monochrome | direct |
| JBIG2 (single-page generic coding) | bi-level | direct |

the other entries seem to use the term monochrome for 1 bit per pixel images.

the other entries seem to use the term `monochrome` for 1 bit per pixel images.
Poster

Monochrome is also often used for greyscale images, however. See e.g. https://en.wikipedia.org/wiki/Monochrome

Bi-level is pretty standard terminology, though "binary images" or perhaps even "1-bit images" might be clearer. https://en.wikipedia.org/wiki/Binary_image

Monochrome is also often used for greyscale images, however. See e.g. https://en.wikipedia.org/wiki/Monochrome Bi-level is pretty standard terminology, though "binary images" or perhaps even "1-bit images" might be clearer. https://en.wikipedia.org/wiki/Binary_image

I'm fine with choosing another term, all I mean is the table should be made consistent.

I'm fine with choosing another term, all I mean is the table should be made consistent.
Poster

Apologies for the delay. I've updated the README to consistently say "1-bit monochrome" (to differentiate it from the other meaning of "grayscale"). Does this look good?

Apologies for the delay. I've updated the README to consistently say "1-bit monochrome" (to differentiate it from the other meaning of "grayscale"). Does this look good?
ooBJ3u added 1 commit 3 months ago
150a23169b Update 'README.md'
Per comment #184/files#issuecomment-1334
ooBJ3u commented 1 month ago
Poster

@josch Would you like to have another look at this? All comments should be addressed now.

@josch Would you like to have another look at this? All comments should be addressed now.
josch commented 1 month ago
Owner

Nice!

I have a question. Why does this happen:

$ jbigtopnm mono.jb2
jbigtopnm: Invalid contents of input file.  Input data stream contains invalid data
Nice! I have a question. Why does this happen: ``` $ jbigtopnm mono.jb2 jbigtopnm: Invalid contents of input file. Input data stream contains invalid data ```
ooBJ3u commented 2 weeks ago
Poster

@josch
I have a question. Why does this happen:

$ jbigtopnm mono.jb2
jbigtopnm: Invalid contents of input file.  Input data stream contains invalid data

Apologies for the delay. jbigtopnm only supports JBIG1, not JBIG2. JBIG1 is still used in fax machines, but is not supported by PDF, so not too relevant for us.

> @josch > I have a question. Why does this happen: > > ``` > $ jbigtopnm mono.jb2 > jbigtopnm: Invalid contents of input file. Input data stream contains invalid data > ``` Apologies for the delay. `jbigtopnm` only supports JBIG1, not JBIG2. JBIG1 is still used in fax machines, but is not supported by PDF, so not too relevant for us.
This pull request can be merged automatically.
You are not authorized to merge this pull request.
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b ooBJ3u-main main
git pull main

Step 2:

Merge the changes and update on Gitea.
git checkout main
git merge --no-ff ooBJ3u-main
git push origin main
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: josch/img2pdf#184
Loading…
There is no content yet.