1
0
Fork 0
forked from josch/img2pdf
img2pdf/src
ooBJ3u 244600065d Strip end-of-page and end-of-file segments from JBIG2
As noted by @phmccarty in
josch/img2pdf#184 (comment)
and subsequent comments, we were not properly stripping end-of-page and
end-of-file segments. These are valid segments in a JBIG2 file, but not
when embedded in PDF.

From the PDF spec:
> The JBIG2 file header, end-of-page segments, and end-of-file segment
> shall not be used in PDF.

We were already stripping out the JBIG2 file header, but not yet the
end-of-page and end-of-file segments.

For this, I'm expanding the approach that we were already taking, of
only supporting a narrow subset of JBIG2 files. We assert that the input
file has such a footer, and then we strip it.

We validated that the issue raised by @phmccarty is indeed resolved by
running the following code before and after applying this commit:

```sh
src/img2pdf.py src/tests/input/mono.jb2 > test.pdf
pdfimages -tiff test.pdf img
```

Before this commit, this returned "Syntax Error (1143): Unknown segment
type in JBIG2 stream". After this commit, the error is gone.
2024-10-30 00:00:00 +00:00
..
tests Add support for JBIG2 (generic coding) 2024-09-25 00:00:00 +00:00
img2pdf.py Strip end-of-page and end-of-file segments from JBIG2 2024-10-30 00:00:00 +00:00
img2pdf_test.py Strip end-of-page and end-of-file segments from JBIG2 2024-10-30 00:00:00 +00:00
jp2.py jp2: rudimentary support for raw jpeg2000 without jp2 boxes 2023-08-08 07:40:38 +02:00