The getexif() procedure is available since Pillow 6.0.0. If it's
available, change the algorithm to a simplified version.
In the future, the _getexif() branch can be deleted.
As noted by @phmccarty in
#184 (comment)
and subsequent comments, we were not properly stripping end-of-page and
end-of-file segments. These are valid segments in a JBIG2 file, but not
when embedded in PDF.
From the PDF spec:
> The JBIG2 file header, end-of-page segments, and end-of-file segment
> shall not be used in PDF.
We were already stripping out the JBIG2 file header, but not yet the
end-of-page and end-of-file segments.
For this, I'm expanding the approach that we were already taking, of
only supporting a narrow subset of JBIG2 files. We assert that the input
file has such a footer, and then we strip it.
We validated that the issue raised by @phmccarty is indeed resolved by
running the following code before and after applying this commit:
```sh
src/img2pdf.py src/tests/input/mono.jb2 > test.pdf
pdfimages -tiff test.pdf img
```
Before this commit, this returned "Syntax Error (1143): Unknown segment
type in JBIG2 stream". After this commit, the error is gone.
Implements the proposal detailed at
#112 (comment)
This is a limited implementation of JBIG2, which can be extended to
support multiple pages, symbol tables, and other features of the format
in the future.
Added a test case based on mono.tif.
Updated the README.md based on
#184/files (comment)
This is relevant for the MPO format which otherwise would result in PDF
files containing the same image in different sizes multiple times. With
this change, the default is to only have a single page containing the
full MPO. This means that extracting that MPO also gets the thumbnails
back.
With the --include-thumbnails option, each frame gets stored on its own
page as it is done for multi-frame GIF, for example.
Closes: #135
Ensure that timezones are correctly interpreted in the input by calling
`.astimezone()` as appropriate on datetime objects, and store the
resulting date fields as UTC.
One could argue that datetimes in the local timezone be stored in the
PDF, but then the date string handling becomes more complicated; the PDF
and XMP date specs both use the `Z` suffix to indicate UTC time, but
other +/- offsets require different syntax between the two specs.