Add support for JBIG2 (generic coding) #184

Open
ooBJ3u wants to merge 2 commits from ooBJ3u/img2pdf:main into main

2 commits

Author SHA1 Message Date
244600065d Strip end-of-page and end-of-file segments from JBIG2
As noted by @phmccarty in
#184 (comment)
and subsequent comments, we were not properly stripping end-of-page and
end-of-file segments. These are valid segments in a JBIG2 file, but not
when embedded in PDF.

From the PDF spec:
> The JBIG2 file header, end-of-page segments, and end-of-file segment
> shall not be used in PDF.

We were already stripping out the JBIG2 file header, but not yet the
end-of-page and end-of-file segments.

For this, I'm expanding the approach that we were already taking, of
only supporting a narrow subset of JBIG2 files. We assert that the input
file has such a footer, and then we strip it.

We validated that the issue raised by @phmccarty is indeed resolved by
running the following code before and after applying this commit:

```sh
src/img2pdf.py src/tests/input/mono.jb2 > test.pdf
pdfimages -tiff test.pdf img
```

Before this commit, this returned "Syntax Error (1143): Unknown segment
type in JBIG2 stream". After this commit, the error is gone.
2024-10-30 00:00:00 +00:00
e2369eb59a Add support for JBIG2 (generic coding)
Implements the proposal detailed at
https://gitlab.mister-muffin.de/josch/img2pdf/issues/112#issuecomment-1304

This is a limited implementation of JBIG2, which can be extended to
support multiple pages, symbol tables, and other features of the format
in the future.

Added a test case based on mono.tif.

Updated the README.md based on
https://gitlab.mister-muffin.de/josch/img2pdf/pulls/184/files#issuecomment-1334
2024-09-25 00:00:00 +00:00