Commit graph

447 commits

Author SHA1 Message Date
3ba7d17e15
HACKING: document gitea release 2025-02-21 00:40:17 +01:00
43c16ac369
HACKING: add final git push 2025-02-16 17:45:09 +01:00
08c4d9beec
release version 0.6.0 2025-02-15 15:07:29 +01:00
9e6eba9f40
reformat with black 2025-02-15 14:59:04 +01:00
5aeb628506
Extract an API to predict the DPI used by img2pdf 2025-02-15 14:48:33 +01:00
b6dbfdb481
Slightly simplify imgformat retrieval
No need for a loop here - we can access the enum like a dictionary,
which should be more efficient.
2025-02-15 14:40:22 +01:00
23436114f8
Slightly simplify the getexif procedure if PIL is new enough
The getexif() procedure is available since Pillow 6.0.0. If it's
available, change the algorithm to a simplified version.

In the future, the _getexif() branch can be deleted.
2025-02-15 14:36:37 +01:00
2d5e4e3cb7
break out convert_to_docobject() from convert() which returns a document handle 2025-02-15 14:35:57 +01:00
5e515abb6f
src/tests/output/mono.jb2.pdf: strip off the last 22 bytes (end-of-page and end-of-file segments) 2025-02-15 09:49:39 +01:00
a2e2998fb1
Strip end-of-page and end-of-file segments from JBIG2
As noted by @phmccarty in

and subsequent comments, we were not properly stripping end-of-page and
end-of-file segments. These are valid segments in a JBIG2 file, but not
when embedded in PDF.

From the PDF spec:
> The JBIG2 file header, end-of-page segments, and end-of-file segment
> shall not be used in PDF.

We were already stripping out the JBIG2 file header, but not yet the
end-of-page and end-of-file segments.

For this, I'm expanding the approach that we were already taking, of
only supporting a narrow subset of JBIG2 files. We assert that the input
file has such a footer, and then we strip it.

We validated that the issue raised by @phmccarty is indeed resolved by
running the following code before and after applying this commit:

```sh
src/img2pdf.py src/tests/input/mono.jb2 > test.pdf
pdfimages -tiff test.pdf img
```

Before this commit, this returned "Syntax Error (1143): Unknown segment
type in JBIG2 stream". After this commit, the error is gone.
2025-02-15 08:12:51 +01:00
14948e7ba8
Add support for JBIG2 (generic coding)
Implements the proposal detailed at


This is a limited implementation of JBIG2, which can be extended to
support multiple pages, symbol tables, and other features of the format
in the future.

Added a test case based on mono.tif.

Updated the README.md based on
2025-02-15 08:12:51 +01:00
bcfdf8b54e
src/img2pdf_test.py: test_jpg_2000_rgba8 no longer works with compare_poppler() 2025-02-15 08:12:08 +01:00
9f74740c95
src/img2pdf_test.py: test_miff_cmyk8 now compares exactly 2025-02-15 08:12:07 +01:00
cbc3d50c63
src/img2pdf_test.py: support None for tiff:alpha 2025-02-15 08:12:07 +01:00
4b549592bf
README.md: add example of how to use img2pdf together with scanimage 2024-09-11 11:35:07 +02:00
5540365cfd
add example for how to specify custom dpi 2024-09-11 11:34:14 +02:00
819b366bf5
release version 0.5.1 2023-11-26 06:33:10 +01:00
cc8c708295
HACKING: how to bisect 2023-11-25 09:47:53 +01:00
fb9537d8b7
src/img2pdf.py: allow PNG input without dpi units but non-square dpi aspect ratio
Closes: 
2023-11-25 09:47:52 +01:00
7678435eb7
validate icc profile and no default location on windows
closes: 
2023-11-07 18:50:07 +01:00
ba7a360866
release version 0.5.0 2023-10-28 08:35:54 +02:00
7f0bf47ff3
src/img2pdf.py: reformat with black 2023-10-28 08:35:53 +02:00
Leo
5cd0918d50 Issue related. The original was SmartAlbums, but another case with 'Adobe PS', so delete the exif_software check part 2023-10-18 13:33:44 +08:00
Leo
f157ced05d
ignore RGB icc profile for grayscale jpegs produced by SmartAlbums
closes: 
2023-10-17 11:32:25 +02:00
09064e8e70
jp2: rudimentary support for raw jpeg2000 without jp2 boxes 2023-08-08 07:40:38 +02:00
2f736d7891
allow 'matte' to be missing in MIFF 2023-08-06 19:43:19 +02:00
e05580a49a
src/img2pdf_test.py: IM7 dropped 'baseType' in json output, so use 'type' instead which works for both IM6 and IM7 2023-08-06 19:27:01 +02:00
acc25a4926
Support JPEG2000 images with transparency
Closes: 
2023-08-05 16:06:30 +02:00
f597887088
The GIMP ICC bug does not only apply to 1-bit tiff but also to black/white palette PNG
https://gitlab.gnome.org/GNOME/gimp/-/issues/3438

Closes: 
2023-08-05 14:43:18 +02:00
3e832fbcc2
add information about how to convert images to 8 bit (closes: ) 2023-08-05 14:43:07 +02:00
1e8557cef1
src/img2pdf_test.py: drop check for endianness for tests where it does not matter
IM7 defaults to big-endian on architectures other than x86 even if they
are little endian: https://github.com/ImageMagick/ImageMagick/issues/6300

Closes: 
2023-08-05 14:42:48 +02:00
29921eeabd
the default PDF/A icc profile is /usr/share/color/icc/sRGB.icc, /usr/share/color/icc/OpenICC/sRGB.icc or /usr/share/color/icc/colord/sRGB.icc depending on which one exists 2023-06-11 21:56:21 +02:00
33139612f8
src/img2pdf_test.py: make endianness dependant on sys.byteorder (closes: ) 2023-06-11 14:45:09 +02:00
64d27f4a8b
src/img2pdf_test.py: allow Bilevel as well as Grayscale type for png_gray1_img (closes: ) 2023-06-11 13:24:30 +02:00
85cbe1d128
factor out argparse.ArgumentParser to allow for generating completions via shtab 2023-06-11 08:09:46 +02:00
b25429a4c1
src/img2pdf_test.py: add tests for timestamps 2023-06-11 08:01:36 +02:00
c703e9df06
fix date(1) based timestamp parser 2023-06-11 07:48:23 +02:00
79e9985f35
src/img2pdf_test.py: black 2023-06-11 07:47:22 +02:00
cb2644c34f
do not include thumbnails in the output by default unless --include-thumbnails is used
This is relevant for the MPO format which otherwise would result in PDF
files containing the same image in different sizes multiple times. With
this change, the default is to only have a single page containing the
full MPO. This means that extracting that MPO also gets the thumbnails
back.

With the --include-thumbnails option, each frame gets stored on its own
page as it is done for multi-frame GIF, for example.

Closes: 
2023-06-11 07:31:07 +02:00
81502f21af Convert creation/modification dates to UTC (fixes )
Ensure that timezones are correctly interpreted in the input by calling
`.astimezone()` as appropriate on datetime objects, and store the
resulting date fields as UTC.

One could argue that datetimes in the local timezone be stored in the
PDF, but then the date string handling becomes more complicated; the PDF
and XMP date specs both use the `Z` suffix to indicate UTC time, but
other +/- offsets require different syntax between the two specs.
2023-06-10 17:53:03 -07:00
0cbcb8fa12
avoid converting palette PNG with alpha to RGB (closes: ) 2023-06-08 08:54:37 +02:00
e9e04b6dd9
extend comments around dropping ICC profile stored by GIMP for bilevel input 2023-06-08 08:53:22 +02:00
fc059ee471
use quotes around caret in examples for windows users
Closes: 
2023-06-08 07:14:17 +02:00
25466113e9
another small fixup for the last commit 2023-05-30 08:06:36 +02:00
7405635b72
only check whether icc profile can be dropped if there is any 2023-05-30 07:10:32 +02:00
aea472101b
strip off RGB color profile from bilevel TIFF images produced by gimp (closes: ) 2023-05-30 06:25:26 +02:00
7fa67bb337
demote print() to logger.debug() 2023-05-29 09:25:21 +02:00
7d40569aa1
Inform the user what is happening when running without any arguments and suggest using --help to get the help text (closes: ) 2023-05-28 15:25:28 +02:00
83f9c32328
appveyor.yml: try out --console --nowindowed 2023-05-28 15:25:28 +02:00
be8369373f
pass deterministic_id=True to writer.save() for pikepdf >= 6.2.0
Closes: 
2022-10-16 14:13:35 +02:00