Add support for JBIG2 (generic coding) #184

Closed

ooBJ3u wants to merge 2 commits from ooBJ3u/img2pdf:main into main

Author	SHA1	Message	Date
ooBJ3u	244600065d	Strip end-of-page and end-of-file segments from JBIG2 As noted by @phmccarty in #184 (comment) and subsequent comments, we were not properly stripping end-of-page and end-of-file segments. These are valid segments in a JBIG2 file, but not when embedded in PDF. From the PDF spec: > The JBIG2 file header, end-of-page segments, and end-of-file segment > shall not be used in PDF. We were already stripping out the JBIG2 file header, but not yet the end-of-page and end-of-file segments. For this, I'm expanding the approach that we were already taking, of only supporting a narrow subset of JBIG2 files. We assert that the input file has such a footer, and then we strip it. We validated that the issue raised by @phmccarty is indeed resolved by running the following code before and after applying this commit: ```sh src/img2pdf.py src/tests/input/mono.jb2 > test.pdf pdfimages -tiff test.pdf img ``` Before this commit, this returned "Syntax Error (1143): Unknown segment type in JBIG2 stream". After this commit, the error is gone.	2024-10-30 00:00:00 +00:00
ooBJ3u	e2369eb59a	Add support for JBIG2 (generic coding) Implements the proposal detailed at #112 (comment) This is a limited implementation of JBIG2, which can be extended to support multiple pages, symbol tables, and other features of the format in the future. Added a test case based on mono.tif. Updated the README.md based on #184/files (comment)	2024-09-25 00:00:00 +00:00

Author

SHA1

Message

Date

ooBJ3u

244600065d

Strip end-of-page and end-of-file segments from JBIG2

As noted by @phmccarty in
#184 (comment)
and subsequent comments, we were not properly stripping end-of-page and
end-of-file segments. These are valid segments in a JBIG2 file, but not
when embedded in PDF.

From the PDF spec:
> The JBIG2 file header, end-of-page segments, and end-of-file segment
> shall not be used in PDF.

We were already stripping out the JBIG2 file header, but not yet the
end-of-page and end-of-file segments.

For this, I'm expanding the approach that we were already taking, of
only supporting a narrow subset of JBIG2 files. We assert that the input
file has such a footer, and then we strip it.

We validated that the issue raised by @phmccarty is indeed resolved by
running the following code before and after applying this commit:

```sh
src/img2pdf.py src/tests/input/mono.jb2 > test.pdf
pdfimages -tiff test.pdf img
```

Before this commit, this returned "Syntax Error (1143): Unknown segment
type in JBIG2 stream". After this commit, the error is gone.

2024-10-30 00:00:00 +00:00

ooBJ3u

e2369eb59a

Add support for JBIG2 (generic coding)

Implements the proposal detailed at
#112 (comment)

This is a limited implementation of JBIG2, which can be extended to
support multiple pages, symbol tables, and other features of the format
in the future.

Added a test case based on mono.tif.

Updated the README.md based on
#184/files (comment)

2024-09-25 00:00:00 +00:00