Compare commits

...

363 commits

Author SHA1 Message Date
bb188a3eaf
release version 0.6.1 2025-04-27 18:54:38 +02:00
69c3ac6b25
src/img2pdf_test.py: do not unlink temporary files -- let pytest take care of that for us
This way, we can export all the artifacts for later retrieval when
pytest runs on CI systems, making debugging of issues far easier as it
avoids having to recreate the artifacts locally first.
2025-04-27 18:50:11 +02:00
dffc0dbe16
src/img2pdf.py: fix camelCase -> snake_case change of pymupdf
Thanks: Blair Chintella
2025-04-27 18:48:50 +02:00
Art Gabdullin
b91007fef8
README.md - Fix Windows bin URL 2025-03-26 03:54:38 +01:00
a8cb28ba31
src/img2pdf_test.py: skip test_miff_cmyk16 on s390x because of https://github.com/ImageMagick/ImageMagick/issues/8055 2025-03-23 15:37:32 +01:00
c6d12d6239
src/img2pdf_test.py: skip test_tiff_float on s390x because of https://github.com/ImageMagick/ImageMagick/issues/8054 2025-03-23 15:37:32 +01:00
59132f20f8
src/img2pdf_test.py: exiftool -all= now sets the unit to Undefined since version 13.23
Thanks: gregor herrmann <gregoa@debian.org>
2025-03-23 15:37:32 +01:00
3ba7d17e15
HACKING: document gitea release 2025-02-21 00:40:17 +01:00
43c16ac369
HACKING: add final git push 2025-02-16 17:45:09 +01:00
08c4d9beec
release version 0.6.0 2025-02-15 15:07:29 +01:00
9e6eba9f40
reformat with black 2025-02-15 14:59:04 +01:00
5aeb628506
Extract an API to predict the DPI used by img2pdf 2025-02-15 14:48:33 +01:00
b6dbfdb481
Slightly simplify imgformat retrieval
No need for a loop here - we can access the enum like a dictionary,
which should be more efficient.
2025-02-15 14:40:22 +01:00
23436114f8
Slightly simplify the getexif procedure if PIL is new enough
The getexif() procedure is available since Pillow 6.0.0. If it's
available, change the algorithm to a simplified version.

In the future, the _getexif() branch can be deleted.
2025-02-15 14:36:37 +01:00
2d5e4e3cb7
break out convert_to_docobject() from convert() which returns a document handle 2025-02-15 14:35:57 +01:00
5e515abb6f
src/tests/output/mono.jb2.pdf: strip off the last 22 bytes (end-of-page and end-of-file segments) 2025-02-15 09:49:39 +01:00
a2e2998fb1
Strip end-of-page and end-of-file segments from JBIG2
As noted by @phmccarty in

and subsequent comments, we were not properly stripping end-of-page and
end-of-file segments. These are valid segments in a JBIG2 file, but not
when embedded in PDF.

From the PDF spec:
> The JBIG2 file header, end-of-page segments, and end-of-file segment
> shall not be used in PDF.

We were already stripping out the JBIG2 file header, but not yet the
end-of-page and end-of-file segments.

For this, I'm expanding the approach that we were already taking, of
only supporting a narrow subset of JBIG2 files. We assert that the input
file has such a footer, and then we strip it.

We validated that the issue raised by @phmccarty is indeed resolved by
running the following code before and after applying this commit:

```sh
src/img2pdf.py src/tests/input/mono.jb2 > test.pdf
pdfimages -tiff test.pdf img
```

Before this commit, this returned "Syntax Error (1143): Unknown segment
type in JBIG2 stream". After this commit, the error is gone.
2025-02-15 08:12:51 +01:00
14948e7ba8
Add support for JBIG2 (generic coding)
Implements the proposal detailed at


This is a limited implementation of JBIG2, which can be extended to
support multiple pages, symbol tables, and other features of the format
in the future.

Added a test case based on mono.tif.

Updated the README.md based on
2025-02-15 08:12:51 +01:00
bcfdf8b54e
src/img2pdf_test.py: test_jpg_2000_rgba8 no longer works with compare_poppler() 2025-02-15 08:12:08 +01:00
9f74740c95
src/img2pdf_test.py: test_miff_cmyk8 now compares exactly 2025-02-15 08:12:07 +01:00
cbc3d50c63
src/img2pdf_test.py: support None for tiff:alpha 2025-02-15 08:12:07 +01:00
4b549592bf
README.md: add example of how to use img2pdf together with scanimage 2024-09-11 11:35:07 +02:00
5540365cfd
add example for how to specify custom dpi 2024-09-11 11:34:14 +02:00
819b366bf5
release version 0.5.1 2023-11-26 06:33:10 +01:00
cc8c708295
HACKING: how to bisect 2023-11-25 09:47:53 +01:00
fb9537d8b7
src/img2pdf.py: allow PNG input without dpi units but non-square dpi aspect ratio
Closes: 
2023-11-25 09:47:52 +01:00
7678435eb7
validate icc profile and no default location on windows
closes: 
2023-11-07 18:50:07 +01:00
ba7a360866
release version 0.5.0 2023-10-28 08:35:54 +02:00
7f0bf47ff3
src/img2pdf.py: reformat with black 2023-10-28 08:35:53 +02:00
Leo
5cd0918d50 Issue related. The original was SmartAlbums, but another case with 'Adobe PS', so delete the exif_software check part 2023-10-18 13:33:44 +08:00
Leo
f157ced05d
ignore RGB icc profile for grayscale jpegs produced by SmartAlbums
closes: 
2023-10-17 11:32:25 +02:00
09064e8e70
jp2: rudimentary support for raw jpeg2000 without jp2 boxes 2023-08-08 07:40:38 +02:00
2f736d7891
allow 'matte' to be missing in MIFF 2023-08-06 19:43:19 +02:00
e05580a49a
src/img2pdf_test.py: IM7 dropped 'baseType' in json output, so use 'type' instead which works for both IM6 and IM7 2023-08-06 19:27:01 +02:00
acc25a4926
Support JPEG2000 images with transparency
Closes: 
2023-08-05 16:06:30 +02:00
f597887088
The GIMP ICC bug does not only apply to 1-bit tiff but also to black/white palette PNG
https://gitlab.gnome.org/GNOME/gimp/-/issues/3438

Closes: 
2023-08-05 14:43:18 +02:00
3e832fbcc2
add information about how to convert images to 8 bit (closes: ) 2023-08-05 14:43:07 +02:00
1e8557cef1
src/img2pdf_test.py: drop check for endianness for tests where it does not matter
IM7 defaults to big-endian on architectures other than x86 even if they
are little endian: https://github.com/ImageMagick/ImageMagick/issues/6300

Closes: 
2023-08-05 14:42:48 +02:00
29921eeabd
the default PDF/A icc profile is /usr/share/color/icc/sRGB.icc, /usr/share/color/icc/OpenICC/sRGB.icc or /usr/share/color/icc/colord/sRGB.icc depending on which one exists 2023-06-11 21:56:21 +02:00
33139612f8
src/img2pdf_test.py: make endianness dependant on sys.byteorder (closes: ) 2023-06-11 14:45:09 +02:00
64d27f4a8b
src/img2pdf_test.py: allow Bilevel as well as Grayscale type for png_gray1_img (closes: ) 2023-06-11 13:24:30 +02:00
85cbe1d128
factor out argparse.ArgumentParser to allow for generating completions via shtab 2023-06-11 08:09:46 +02:00
b25429a4c1
src/img2pdf_test.py: add tests for timestamps 2023-06-11 08:01:36 +02:00
c703e9df06
fix date(1) based timestamp parser 2023-06-11 07:48:23 +02:00
79e9985f35
src/img2pdf_test.py: black 2023-06-11 07:47:22 +02:00
cb2644c34f
do not include thumbnails in the output by default unless --include-thumbnails is used
This is relevant for the MPO format which otherwise would result in PDF
files containing the same image in different sizes multiple times. With
this change, the default is to only have a single page containing the
full MPO. This means that extracting that MPO also gets the thumbnails
back.

With the --include-thumbnails option, each frame gets stored on its own
page as it is done for multi-frame GIF, for example.

Closes: 
2023-06-11 07:31:07 +02:00
81502f21af Convert creation/modification dates to UTC (fixes )
Ensure that timezones are correctly interpreted in the input by calling
`.astimezone()` as appropriate on datetime objects, and store the
resulting date fields as UTC.

One could argue that datetimes in the local timezone be stored in the
PDF, but then the date string handling becomes more complicated; the PDF
and XMP date specs both use the `Z` suffix to indicate UTC time, but
other +/- offsets require different syntax between the two specs.
2023-06-10 17:53:03 -07:00
0cbcb8fa12
avoid converting palette PNG with alpha to RGB (closes: ) 2023-06-08 08:54:37 +02:00
e9e04b6dd9
extend comments around dropping ICC profile stored by GIMP for bilevel input 2023-06-08 08:53:22 +02:00
fc059ee471
use quotes around caret in examples for windows users
Closes: 
2023-06-08 07:14:17 +02:00
25466113e9
another small fixup for the last commit 2023-05-30 08:06:36 +02:00
7405635b72
only check whether icc profile can be dropped if there is any 2023-05-30 07:10:32 +02:00
aea472101b
strip off RGB color profile from bilevel TIFF images produced by gimp (closes: ) 2023-05-30 06:25:26 +02:00
7fa67bb337
demote print() to logger.debug() 2023-05-29 09:25:21 +02:00
7d40569aa1
Inform the user what is happening when running without any arguments and suggest using --help to get the help text (closes: ) 2023-05-28 15:25:28 +02:00
83f9c32328
appveyor.yml: try out --console --nowindowed 2023-05-28 15:25:28 +02:00
be8369373f
pass deterministic_id=True to writer.save() for pikepdf >= 6.2.0
Closes: 
2022-10-16 14:13:35 +02:00
10c6901fa3
src/img2pdf_test.py: do not test the depth attribute and rely on baseDepth
closes: 
2022-09-23 23:10:53 +02:00
57d7e07e6b
Support imagemagick 7.1.0-48
- the output of -metric PSNR changed
 - CMYK output can now be exactly compared

closes: 
2022-09-15 04:36:16 +02:00
272fe0433f allow pathlib.Path objects by allowing objects implementing read_bytes function 2022-07-02 21:19:34 +02:00
ef7b9e739d add miff tests for cmyk8 and rgb8 2022-07-02 20:39:18 +02:00
af6fe27d53 avoid match/case for now until python 3.10 is available on more platforms 2022-06-28 14:22:14 +01:00
bad6fcae39 support for MIFF which allows 16 bit CMYK images
closes: 
2022-06-27 13:22:07 +01:00
d9b90499f3
README.md: compare to econvert (closes: ) 2022-05-18 13:08:05 +02:00
edb0d29a14
README.md: fix link 2022-05-13 21:27:12 +02:00
bb3e8b0098
README.md: document that img2pdf.exe can now be downloaded via release 2022-05-13 21:25:37 +02:00
f454ebc6a6
release version 0.4.4 2022-04-07 22:40:36 +02:00
c3db273e23
Remove outdated readme entry concerning JP2 colorspace
If I understood the code in `jp2.py` correctly, this should now work.
Moreover, Pillow should usually be able to open JP2 files, so `jp2.py` is only a fallback.
2022-04-07 22:08:41 +02:00
87afabd3cf
add .mailmap 2022-04-07 22:08:18 +02:00
homocomputeris
5045282cc2
Add B and JB paper sizes 2022-04-07 22:02:16 +02:00
fb4b96452a
reformat with black 2022-04-07 21:58:34 +02:00
c553e169a4
use TiffImagePlugin.STRIP_SIZE for Pillow >= 8.4.0 2022-04-07 21:58:23 +02:00
d9345ac767
set GifImagePlugin.LOADING_STRATEGY to RGB_AFTER_DIFFERENT_PALETTE_ONLY in Pillow >= 9.1.0 (closes: ) 2022-04-07 21:57:37 +02:00
1d52530229
support new pikepdf.Page object when making indirect objects (closes: ) 2022-04-07 21:55:45 +02:00
3b117e674b
clarify that input with an alpha channel requires additional computation (closes: ) 2022-04-07 17:13:16 +02:00
e8ca53738f
--viewer-page-layout support for twopageright and twopageleft 2022-03-25 14:01:18 +01:00
7c48bfb868
find_scale(): better document why we use powers of 10 2022-03-24 12:43:49 +01:00
244f034a2e
src/img2pdf_test.py: format with black 2022-03-22 17:57:12 +01:00
3da370d3bd
add more information about how to ignore invalid rotation values in input images 2022-02-06 20:33:30 +01:00
6cff2931e4
ensure that gif test input uses the same palette for all frames 2022-02-06 20:25:25 +01:00
6a55258804
appveyor.yml: rename pil to Pillow 2022-01-16 23:38:22 +01:00
3cdeab08ab
appveyor.yml: also install pil so that maybe pyinstaller picks it up 2022-01-16 23:36:37 +01:00
cea7c9120b
tox.ini: python 3.5 and 3.6 are not supported anymore 2022-01-16 22:18:22 +01:00
9eacfdaa76
appveyor.yml: don't run tests because we don't have imagemagick 2021-12-11 11:22:59 +01:00
95a313f437
tox.ini: add python 3.10 to envlist 2021-12-11 11:22:31 +01:00
30d705f020
src/img2pdf.py: the /OpenAction array must contain the page as an indirect object 2021-11-28 16:55:42 +01:00
dc926b2cf2
release version 0.4.3 2021-10-24 16:51:02 +02:00
a8fdbd0038
fix --viewer-initial-page (broken in last release) 2021-10-24 16:15:46 +02:00
6ff175d637
update my name 2021-10-13 09:17:44 +02:00
0732dff0be
src/img2pdf_test.py: make img2pdf path configurable 2021-10-13 09:17:17 +02:00
50b7145f64
release version 0.4.2 2021-10-11 17:16:03 +02:00
e522ec14d9
remove pdfrw from tests (closes: ) 2021-10-09 15:29:44 +02:00
9c9e5ece19
src/img2pdf_test.py: add support for imagemagick 7 2021-10-03 22:54:33 +02:00
354fd7c264
src/img2pdf.py: format with black 2021-10-03 21:03:22 +02:00
392d4a665e
adapt AlphaChannelError messages 2021-10-03 21:03:08 +02:00
09ad147d97
support PNG palette images with icc profile (closes: ) 2021-10-03 20:59:59 +02:00
80393b6efa
also support palette+alpha 2021-10-03 20:56:53 +02:00
e265738ac2
Revert "allow palette PNG images with ICC profile (closes: )"
This reverts commit 3d7e0e6812.
2021-10-03 16:09:54 +02:00
1ffb160453
Readme: remove alpha channels section 2021-10-03 16:04:53 +02:00
cde7472d15 Update documentation: img2pdf now supports alpha channels 2021-10-03 11:48:24 +00:00
6eec05c11c
workaround for Pillow >= 8.3.0 which limits CCITT Group4 strip size to 64 KB (closes: ) 2021-10-03 13:24:53 +02:00
Tamás Zahola
f483638b17 Test cases for transparency 2021-08-22 08:08:22 +00:00
Tamás Zahola
7f216a8848 Test support on macOS 2021-08-22 08:08:21 +00:00
Tamás Zahola
2476215f39 Always initialize smask 2021-08-22 08:08:20 +00:00
Tamás Zahola
f62858c245 Formatting 2021-08-22 08:08:20 +00:00
Tamás Zahola
a5e4da5755 Use PNG predictor for /SMask too 2021-08-22 08:08:19 +00:00
Tamás Zahola
64db7909ec Added transparency support for GIFs, palette-based PNGs and grayscale PNGs 2021-08-22 08:08:19 +00:00
Tamás Zahola
af5ae5b9b6 Convert 8-bit PNG alpha channels to /SMasks in PDF 2021-08-22 08:08:19 +00:00
d03f331521
Remove ineffective description-file parameter from setup.cfg
Otherwise, we get a warning:
"Usage of dash-separated 'description-file' will not be supported in
future versions."

This parameter was only used when setup.py was used with pbr=True, which is
not the case here, and setup.py loads the right file into long_description.
2021-08-22 03:15:20 +02:00
635b08c321
README: fix CI URLs 2021-08-21 20:57:15 +02:00
152f6fb629
tests: look for sRGB.icc in several paths, skip related tests if not found
Always use Ghostscript's sRGB.icc.
2021-08-19 19:56:01 +02:00
1f3b456ac9
tests: don't expect ImageMagick to compress TIFF files by default
This is no longer the case with ImageMagick 7.
2021-08-19 13:17:06 +02:00
4c5b72dab0
src/img2pdf_test.py: skip tests when icc profiles are not present 2021-08-19 13:08:32 +02:00
853a1ec363 src/img2pdf_test.py: do not test format descriptions 2021-08-19 10:59:46 +00:00
55d589a548
README.md: document how to remove alpha channel and --rotation=ifvalid 2021-08-14 10:19:46 +02:00
5c617965f5
document the ifvalid option value further 2021-08-14 10:19:06 +02:00
0067edf965
remove first_frame_only from gui 2021-08-14 10:18:15 +02:00
91e3a94c3d
sort globbing result on windows 2021-07-02 16:18:20 +02:00
3d7e0e6812
allow palette PNG images with ICC profile (closes: ) 2021-06-14 18:34:46 +02:00
b4c8aa1a5f
add --rotation argument overwriting exif data (closes: ) 2021-05-14 11:31:53 +02:00
114d7270a2
release version 0.4.1 2021-05-09 18:57:41 +02:00
80d24a1d49
README.md: add more examples (closes: ) 2021-05-07 11:20:47 +02:00
ea2245757f
add --from-file to read arbitrarily many images and circumvent the maximum command length of the shell (closes: ) 2021-05-07 10:59:19 +02:00
9cda595cd5
I got married and my last name changed 2021-05-07 09:41:27 +02:00
2eabebb513
use context manager instead of manually closing BytesIO and PIL.Image 2021-04-28 15:37:44 +02:00
02c85a50ad
move closing code into its own function 2021-04-14 08:25:18 +02:00
c97ce34023
raise exception if border is larger than page size 2021-04-13 13:10:57 +02:00
81325d3998
fix page border computation (closes: ) 2021-04-13 13:10:34 +02:00
8d2ae0f64e
fix reading from stdin 2021-04-13 08:12:33 +02:00
d29c596fe7
add support for MPO images (closes: ) 2021-04-12 22:57:51 +02:00
cd1088a5a9
format with black 2021-04-12 14:13:34 +02:00
2a8779295f
write out pil format as debug message 2021-04-12 14:13:19 +02:00
6cd819d28f
do not use root logger for logging 2021-04-12 14:12:14 +02:00
c48e1dbb1e
use sys.exit() as exit might not be defined under windows -- see 2021-04-12 06:52:27 +02:00
d08d8c5be9
src/img2pdf_test.py: compare icc images as non-exact 2021-04-12 06:27:36 +02:00
0e4f0047b2
src/img2pdf_test.py: refactor comparison code to remove code duplication 2021-04-12 06:27:36 +02:00
0ce25d08c2
src/img2pdf_test.py: also draw RGB letters to identify colors 2021-04-12 06:27:36 +02:00
c5fd43e851
src/img2pdf_test.py: numpy.float got deprecated in NumPy 1.20 2021-04-12 06:16:05 +02:00
17fd73aed8
.travis.yml: set osx_image: xcode12.2 or pikepdf import fails 2021-04-07 12:23:40 +02:00
454d4e7775
src/img2pdf_test.py: distinguish between depth and baseDepth for tiff_rgb12 and tiff_rgb13 2021-04-07 12:23:01 +02:00
cb2243fd10
src/img2pdf_test.py: Pillow >= 8.2.0 only uses half the palette size 2021-04-07 12:23:01 +02:00
129bd15b43
tox.ini: run pytest with -vv to show full diff 2021-04-07 12:22:53 +02:00
b8bfa98218
src/img2pdf_test.py: cater for endianess->endianness in 1.0 output format 2021-04-07 08:06:59 +02:00
b5f0912e13
README.md: minimize scope of opened file in examples
Prompted-by: Joao Rodrigo Windisch Olenscki
2021-03-09 17:35:27 +01:00
213a6af41f
tox.ini: add py39 2020-10-31 08:47:01 +01:00
9290cb4a10
.travis.yml: bump windows python version to 3.9 2020-10-31 08:42:25 +01:00
Paul Ingemi
505344f83e
Add Windows support for paths containing wildcards "*" and "?"
Img2pdf accepts paths containing wildcards such as *.jpg to efficiently refer
to multiple input files that match the wildcard expression. Under POSIX
environments the command line shell is expected to perform this expansion,
however Windows requires the command line utility itself to expand the
wildcard expression.

Ideally this would be performed by argparse as described in this draft PEP:
https://mail.python.org/pipermail/python-ideas/2015-August/035244.html

Since argparse doesn't do it, this commit performs expansion directly.

Some implementation notes:
 - Wildcard characters "*" and "?" are not valid in Windows filenames
 - Code doesn't support bracket wildcards such as [0-3] on Windows since
   they are valid filename characters
 - Due to expansion, the images list collected by argparse may contain
   sub-lists. Code uses chain.from_iterable to create a flat list.
 - Paths that refer to non-existant files raise an error message, while
   wildcards that match no files are silently ignored.
2020-10-05 10:28:08 +02:00
Anna Levenberg
32b4ed1f43
Add check for empty list 2020-09-22 16:23:17 +02:00
b2c3b641db
fix typo: complient -> compliant 2020-08-27 02:48:26 +02:00
c4fb1d886f
remove magick.py and test.sh 2020-08-27 00:56:32 +02:00
11907242a5
src/img2pdf_test.py: we create our own channel-switching ICC profile 2020-08-09 22:03:47 +02:00
692b54ac67
release version 0.4.0 2020-08-07 08:05:27 +02:00
cc79581e2c
.travis.yml: install icc-profiles-free 2020-08-07 00:41:42 +02:00
c7db805dee
don't use /usr/share/color/icc/ghostscript/srgb.icc but /usr/share/color/icc/sRGB.icc because the former converts from sRGB to CIE LAB and PDF/A requires a profile from CIE LAB to sRGB (closes: ) 2020-08-07 00:20:44 +02:00
f0b57985ee
first stab at embedding ICC profiles 2020-08-07 00:13:53 +02:00
1ba02bf838
src/img2pdf_test.py: use json output instead of -verbose everywhere (the former is for machines, the latter for human consumption and can change without notice) 2020-08-06 20:25:20 +02:00
042aac71eb
reformat with black 2020-08-05 08:13:52 +02:00
7da0a00ef3
src/img2pdf_test.py: replace jpg_img check by json output 2020-08-05 08:11:39 +02:00
67dca425d1
.travis.yml: remove jpeg2000 check 2020-08-05 01:15:51 +02:00
86fddab622
src/img2pdf_test.py: check for JP2 support 2020-08-05 01:12:59 +02:00
c229e20547
.travis.yml: test for jpeg2000 support 2020-08-05 00:57:30 +02:00
a53fed5d17
.travis.yml: really switch to focal 2020-08-05 00:50:45 +02:00
788102ee05
.travis.yml: use focal 2020-08-05 00:48:15 +02:00
d92790ad26
try ghostscript version shaningans 2020-08-05 00:10:00 +02:00
f0e7e8daaf
.travis.yml: try xenial 2020-08-05 00:00:00 +02:00
9bd41dba1e
src/img2pdf_test.py: check that psnr is not zero 2020-08-04 23:54:11 +02:00
6c742be642
src/img2pdf_test.py: cater for json output of imagemagick before 6.9.9.34 2020-08-04 23:29:42 +02:00
7ccd987d6d
.travis.yml: use bionic instead of xenial 2020-08-04 13:15:43 +02:00
c506cf0b8d
try to use json output instead of identify -verbose 2020-08-04 13:07:40 +02:00
fbb51d9083
tighten minimum imagemagick version for quantum:polarity=min-is-black 2020-08-04 09:30:03 +02:00
593aeb5d2d
appveyor.yml: install pyinstaller to create an exe 2020-08-04 08:03:34 +02:00
c62ed4d691
appveyor.yml: install wheel for bdist_wheel 2020-08-04 07:54:18 +02:00
2493af173c
tox.ini: also remove support for py34 (fails to compile pikepdf) 2020-08-04 07:45:39 +02:00
a39beb5c5e
tox.ini: remove jython and pypy 2020-08-03 23:27:04 +02:00
7a72c38c3d
appveyor.yml: run tox 2020-08-03 23:26:15 +02:00
1d5be0cc9d
src/img2pdf_test.py: also catch subprocess.CalledProcessError 2020-08-03 18:46:58 +02:00
e151ca27eb
src/img2pdf_test.py: skip tests on win32 and macos 2020-08-03 18:45:40 +02:00
7aed0ea0d0
src/img2pdf_test.py: fixes for Ubuntu Trusty 2020-08-03 18:31:55 +02:00
bed0b4cdae
src/img2pdf_test.py: check for mutool >= 1.10.0 2020-08-03 16:04:58 +02:00
53e991bec2
.travis.yml: add dependency on mupdf-tools 2020-08-03 15:06:35 +02:00
11f7db5003
src/img2pdf_test.py: explicitly convert PosixPath to str 2020-08-03 15:04:51 +02:00
000fc3b5ac
remove obsolete src/tests/__init__.py 2020-08-03 14:56:32 +02:00
78bd869b10
.travis.yml: install additional dependencies 2020-08-03 14:50:38 +02:00
0992832ab0
src/img2pdf_test.py: explicitly convert PosixPath to str 2020-08-03 14:49:58 +02:00
1d0e4c5272
.travis.yml: install imagemagick 2020-08-03 14:13:45 +02:00
bfd822b74e
.travis.yml: install tox 2020-08-03 14:03:00 +02:00
2c8e417c51
run tox on travis 2020-08-03 14:01:25 +02:00
66fd7cc765
rename test.py -> src/img2pdf_test.py 2020-08-03 12:34:02 +02:00
5b7f93bb9a
do not allow negative values in parse_num() 2020-08-03 12:25:14 +02:00
a2d846052e
document pytest in HACKING, tox.ini and .travis.yml 2020-08-03 12:24:41 +02:00
a84a1b8480
add test.py 2020-06-06 01:19:30 +02:00
663010ca61
src/img2pdf.py: tiny logic error 2020-06-06 01:19:30 +02:00
802dd4b1f3
magick.py: prevent floyd_steinberg() from modifying its input 2020-06-03 00:15:55 +02:00
93f65a49c9
add --pdfa argument to attach an icc profile for PDF/A-1b compliant output 2020-06-01 02:10:07 +02:00
bca3f802ac
Add support for pikepdf 2020-05-31 17:18:02 +02:00
65d9aed630
test.sh: compare uncompressed contents 2020-05-31 09:08:00 +02:00
791c9497ed
test.sh: remove leftover 'grep' 2020-05-31 09:07:15 +02:00
1cd2674a2c
src/tests/__init__.py: recursively convert both PDFs into Python data structures and then compare for equality -- this allows comparing PDFs in which streams are compressed differently 2020-05-31 09:05:55 +02:00
f4b296cef3
src/tests/__init__.py: make path OS agnostic (slash vs backslash) 2020-05-28 13:16:09 +02:00
5c7ffb09a1
src/tests/__init__.py: add support for engine instead of with_pdfrw 2020-05-28 13:12:07 +02:00
7b58c4e58d
add pikepdf to test dependencies 2020-05-28 12:57:40 +02:00
c6d04acc4b
test.sh: account for variations in output of identify --verbose 2020-05-28 12:52:42 +02:00
c49a098e7b
test.sh: improve portability on macOS and BSD 2020-05-28 12:52:38 +02:00
e4dece1c9f
replace --without-pdfrw by --engine=internal or --engine=pdfrw 2020-05-28 12:46:37 +02:00
997fe8efd8
src/tests/__init__.py: use pikepdf instead of pdfrw 2020-05-28 12:43:08 +02:00
c808061b4b
test.sh: use pikepdf instead of grep to check PDF
- no requirement on layout chosen by PDF engine anymore, thus we can
   test any pdf writer with this
 - check value at full path instead of anywhere in the document, thus no
   possible false positives if expected value was present in a different
   object
 - allows adding additional checks where we couldn't use grep before
   (multipage pdf created by pdfseparate)
2020-05-28 06:32:29 +02:00
17dd59e722
README.md: clarify PDF transparency (closes: ) 2020-05-06 08:55:46 +02:00
60fa588cfb
release version 0.3.6 2020-04-30 22:18:54 +02:00
559d42cd4a
magick.py: use our own deflate algo for bit-by-bit reproducible output independent of the compressor used 2020-04-29 16:15:55 +02:00
6da2bc3aee
release version 0.3.5 2020-04-28 13:39:49 +02:00
9d184ad0cd
test.sh: adjust for imagemagick 6.9.10.86 in fedora 2020-04-26 19:10:16 +02:00
082f999ac8
fix with_pdfrw assignment -- thanks Michał Górny 2020-04-23 07:47:39 +02:00
ceba6a8223
Rip out remaining python3 support
https://www.enricozini.org/blog/2020/python/python-2-is-dead/
2020-04-23 07:13:39 +02:00
9449f96345
only run gui if requested with --gui even on windows (closes: ) 2020-04-23 07:09:20 +02:00
0bbbc7a31a
pdfrw is now disabled by default if it's not installed (closes: ) 2020-04-23 06:59:07 +02:00
a270c987f0
setup.py: stop declaring support for Python2 2020-04-23 06:57:38 +02:00
f0e4c6188e
magick.py: make executable and call main() only if __name__ is __main__ 2020-04-23 06:57:10 +02:00
fb2916e6b2
release version 0.3.4 2020-04-05 15:11:09 +02:00
0521426e57
HACKING: add -m to "git tag" invocation 2020-04-05 15:11:09 +02:00
cbbddf7fe0
setup.py: set long_description_content_type to text/markdown 2020-04-05 15:11:09 +02:00
7a3daec620
add rudimentary GUI 2020-04-05 12:21:09 +02:00
1bb21f0c85
put read() into a different try block than open() 2020-04-05 12:16:59 +02:00
24b679d8e5
README.md: add comparison to podofoimg2pdf 2020-04-05 12:15:53 +02:00
1b6e9e3da6
README.md: add 'Library' header 2020-04-05 12:15:09 +02:00
17624991ee
MANIFEST.in: also add magick.py 2020-04-05 12:13:44 +02:00
5002fb1068
HACKING: add more detailed instructions 2020-04-05 12:12:30 +02:00
5375a17181
README.md: add more examples 2020-03-16 16:06:44 +01:00
b991516714
use python 3.8 on windows 2020-02-25 05:12:08 +01:00
b31d4e48d8
README.md: add appveyor and travis CI batches 2019-08-25 07:05:25 +02:00
be9e48871c
README.md: add screenshot 2019-08-25 07:02:50 +02:00
f1e1dab850
add .travis.yml 2019-08-25 06:22:10 +02:00
c17fd0011d
add appveyor.yml 2019-08-25 06:12:37 +02:00
2b26daeb47
first draft of a tkinter gui 2019-08-25 06:03:48 +02:00
ed6187d541
add --crop-border, --bleed-border, --trim-border and --art-border options 2019-04-02 07:37:03 +02:00
82706671cc
join strings that got put into the same line by black 2019-03-29 00:34:31 +01:00
01022487b7
add AlphaChannelError and ExifOrientationError exceptions 2019-03-28 10:26:46 +01:00
07903e9ef1
im was not defined -- don't print it (closes: ) 2019-03-28 09:56:01 +01:00
d1f101c36a
test.sh: replace imagemagick with custom python script to produce bit-by-bit identical results on all architectures 2019-03-12 03:07:43 +01:00
8d7996709a
release version 0.3.3 2019-01-07 10:44:24 +01:00
e04f7c0a26
HACKING: several small fixes and additions 2019-01-07 10:41:38 +01:00
462e1c23d5
src/img2pdf.py: reformat code with black (https://github.com/ambv/black) 2018-12-25 12:30:10 +01:00
1a8f3f436b
test.sh: test for rotated JPEG 2018-12-25 12:08:09 +01:00
Alexander Schlarb
a282692ac1
Respect EXIF rotation tag of image (closes: )
Read the EXIF information from every input image and use it to correctly set
the page rotation of the resulting PDF document page.
2018-12-25 01:46:09 +01:00
Georg Sauthoff
2faeb2005d
Decompress stream to account for different possible encodings (closes: )
Recompressing the image stream makes the comparison robust against output
preserving changes in the zlib compress output bitstream
(e.g. between different zlib implementations/versions/releases).
Without this, some img2pdf 0.3.2 tests fail on Fedora 29/aarch64.
See also:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/R7GD4L5Z6HELCDAL2RDESWR2F3ZXHWVX/
2018-12-24 18:13:49 +01:00
0639dbd47c
src/tests/__init__.py: fix CommandLineTests on python3 2018-12-24 18:04:40 +01:00
78183c642b
allow wrapping img2pdf main() in contextlib.redirect_stdout() 2018-12-24 18:03:52 +01:00
b39e755424
add legal and tabloid paper formats (closes: ) 2018-12-24 18:02:41 +01:00
8bccc02c67
MANIFEST.in: also install test.sh 2018-12-11 18:19:03 +01:00
josch
4cebd9f15d Merge branch 'agroszer-py2' into 'master'
fixing a slew of python 2 issues

See merge request 
2018-12-11 17:16:50 +00:00
68ca35f39e
release version 0.3.2 2018-11-20 16:37:23 +01:00
42f8ac54a8
Allow CCITT Group4 input without rows-per-strip (closes: ) 2018-11-20 16:30:14 +01:00
d4700dbf38
test.sh: add comment about dct decoding method 2018-08-27 11:42:37 +02:00
bd55ac5a45
test.sh: implement EXIT trap 2018-08-27 08:17:06 +02:00
2f3394a1cb
Add extensive test suite test.sh 2018-08-20 10:28:14 +02:00
50a3109c43
src/img2pdf.py: do not overwrite imgformat variable in loop as this will then affect later loop iterations and we need to retain its original value 2018-08-20 10:27:27 +02:00
03bdc33053
src/img2pdf.py: be more explicit when embedding group4 from TIFF 2018-08-20 10:26:39 +02:00
b758b54ce8
src/img2pdf.py: Disallow TIFF input with more than 8 bits per sample because PIL will silently convert it to 8 bits per sample 2018-08-20 10:25:39 +02:00
a2b969640a
src/img2pdf.py: assign the default dpi to tiff images where PIL returned its default of (1,1) 2018-08-20 10:24:31 +02:00
8961c78dbf
src/img2pdf.py: add commented-out debugging option for PIL's TiffImagePlugin 2018-08-20 10:23:23 +02:00
6c44d8cea6
src/img2pdf.py: add more rationale behind palette encoding 2018-08-20 10:21:18 +02:00
c0ed810036
src/img2pdf.py: work around PIL not reporting a colorspace for 16bit grayscale PNGs 2018-08-17 12:57:46 +02:00
ccca845606
src/img2pdf.py: Read bits per sample from PNG to support PNG images with 1, 2, 4 or 16 bits per sample 2018-08-17 12:55:55 +02:00
53685934f9
src/img2pdf.py: also reject palette images with transparency 2018-08-17 07:27:15 +02:00
ba5a9a1dfc
Allow multipage CCITT group 4 TIFF images 2018-08-16 00:23:18 +02:00
a628ed22f9
Add more debugging output 2018-08-16 00:11:07 +02:00
fbcaca5e6b
big endian encoded TIFF files may have FillOrder set to lsb-to-msb and must have the bits of each byte reversed 2018-08-16 00:10:46 +02:00
a9f4c9b665
release version 0.3.1 2018-08-04 15:44:45 +02:00
002c9c4466
CHANGES.rst: add release dates 2018-08-04 14:37:38 +02:00
198c98a5f9
make pep8 compliant 2018-08-01 22:28:44 +02:00
9395b6fbbe
README.md: comparison to other software 2018-08-01 22:14:14 +02:00
f5d8d86dff
Also do not re-encode CCITT Group 4 encoded TIFF images
Add additional property of input images: inverted

closes: 
2018-08-01 22:12:40 +02:00
36c5034db5
Distribute LICENSE file in release tarball 2018-08-01 20:41:02 +02:00
Jakub Wilk
7131b3d6ee
Use printf(1) instead of non-portable "/bin/echo -ne" 2018-07-21 12:08:23 +02:00
b3fb2de5f6
src/img2pdf.py: fix spelling in --help output 2018-07-19 12:05:27 +02:00
4c5faf408f
README.md: restructure 2018-07-18 15:52:54 +02:00
2ad0c036b2
src/tests/input/mono.tif: set dpi to 72 2018-07-18 14:08:04 +02:00
1aa71e3746
release version 0.3.0 2018-07-18 11:30:43 +02:00
75c43de09d
Create a new PIL image before saving as TIFF to prevent libtiff errors
closes: 
2018-07-18 01:58:03 +02:00
eec1a25a92
add test case for ccittfaxdecode 2018-07-18 01:55:48 +02:00
7de174f4b5
CCITTFaxDecode filter must be in an array 2018-07-18 01:55:48 +02:00
48d5b4d7af
CCITTGroup4 must return empty palette 2018-07-18 00:57:43 +02:00
ce68cb21d7
Add LICENSE file
closes: 
2018-07-18 00:07:30 +02:00
d931f02709
src/tests/__init__.py: handle PNG input images properly 2018-07-18 00:05:15 +02:00
b99fae1380
src/tests/__init__.py: don't test for color space now that we support palette PNGs 2018-07-18 00:04:30 +02:00
ac9ba1d0f6
handle monochrome png properly 2018-07-18 00:03:31 +02:00
eaf9de23c9
Filter must not be array for PNG output, so never make it an array 2018-07-17 23:59:55 +02:00
2a33ff275d
adapt test output to new handling of png input 2018-07-17 23:56:01 +02:00
ebd8d911bb
add test grayscale input png src/tests/input/gray.png 2018-07-17 16:57:44 +02:00
e7755f0a87
src/tests/input/normal.png: remove alpha channel 2018-07-17 16:57:44 +02:00
47a1e0c81e
src/img2pdf.py: do not use PNG encoding for CMYK input 2018-07-17 16:57:44 +02:00
3142824bab
src/tests/__init__.py: do not test animation.gif with pdfrw because it doesn't support hexadecimal palette data 2018-07-17 16:57:44 +02:00
84ce9bbd9c
src/img2pdf.py: do not allow transparency 2018-07-17 16:57:44 +02:00
90293204b2
src/img2pdf.py: use logging.warning because warn() is deprecated 2018-07-17 16:07:01 +02:00
d09aa76c9e
README.md: remove ambiguous section about lossless compression 2018-06-16 16:56:01 +02:00
ade5768d72
src/img2pdf.py: expand error message about alpha channel 2018-03-27 10:39:24 +02:00
920506b867
Reflect the fact that we now use PNG compression in the --help output 2018-03-24 19:53:41 +01:00
e78dd80451
Add option --pillow-limit-break to force Pillow to accept large input images 2018-03-24 19:47:03 +01:00
d9a6c9db03
Interlaced PNGs cannot be directly embedded but have to be re-encoded 2018-03-24 18:59:02 +01:00
7244d2c6ed
Disallow input images with alpha channel as it cannot be preserved in the PDF and would thus make img2pdf not lossless 2018-03-24 18:55:14 +01:00
1d9a25dfd2
Proof of concept of using PDF DecodeParms for storing pixel data with PNG compression 2018-03-15 11:31:36 +01:00
9836b976d3 src/img2pdf.py: handle Pillow returning a DPI of zero (closes: ) 2017-10-18 10:34:59 +02:00
James R. Barlow
a8269391e9 Implement /UserUnit scaling to support oversized PDFs
The original PDF specification supported a maximum of 200x200" pages or
14400 PDF units. In PDF 1.6 rather than remove this limitation, Adobe
added the /UserUnit field for pages, which allows one to specify the
scaling that should be applied for user-facing numbers, while keeping
the internal limit of 14400 units.

Many real-world designs are larger than 200" in one direction. One
example is tractor feed or rolled paper which may be easier to scan in
one continuous run rather than segment into pages.

/UserUnit is independent of the pixel size and resolution of the image.

/UserUnit can also indicate very small page sizes but this is not
implemented here.
2017-08-20 21:19:23 +09:00
James R. Barlow
b54617de19 Fix rawdata[:12] bytes/str comparison 2017-08-20 21:11:32 +09:00
Adam Groszer
0e76a5bd97 fixing a slew of python 2 issues 2017-08-07 11:21:42 +02:00
725462462b release version 0.2.4 2017-05-23 17:53:19 +02:00
Stephan Richter
f626d7860f Produce port that is fully API compatible. 2017-05-23 17:51:17 +02:00
Stephan Richter
06560cd0d4 It is easy enough to create a separate class for the Py2 version. 2017-05-22 19:34:39 +02:00
Stephan Richter
719928f5f5 Declare file encoding, since it contains some non-ascii chars. 2017-05-22 19:34:39 +02:00
Stephan Richter
b25098c3f0 Python 2 does not automatically convert integers to floats, so we have to be a bit more explicit. 2017-05-22 19:34:34 +02:00
Stephan Richter
18d637ea34 Make sure we are using the correct IO class in Python 2. 2017-05-22 12:31:52 -04:00
Stephan Richter
162721b88e Properly setup additional Python 2.7 and PyPy support. Note that Py2 does not have enum, but thankfully somone ported it already. 2017-05-22 12:30:40 -04:00
Stephan Richter
2455ba6e62 Test all Python versions using tox. 2017-05-22 12:29:53 -04:00
45cea44fac README.md: use pip3 instead of pip 2017-01-28 19:09:03 +01:00
fe86469c6c README.md: Improve wording 2017-01-20 09:43:42 +01:00
16993964ce restore pep8 compliance 2017-01-20 09:19:30 +01:00
7df29a9acc Allow to call get_layout_fun with less parameters by specifying defaults 2017-01-20 09:15:02 +01:00
44ce296581 README.md: add more examples of how to use the library 2017-01-20 09:14:36 +01:00
e1b2adf70a MANIFEST.in: also distribute tif and gif test cases 2017-01-20 08:49:18 +01:00
b9d95bed14 MANIFEST.in: also distribute CHANGES.rst 2017-01-20 08:48:51 +01:00
60b2e2206a Add multiframe gif (animated gif) test case 2017-01-20 08:48:16 +01:00
8476e333cd src/tests/__init__.py: add support for multiframe gif tests 2017-01-20 08:47:58 +01:00
e20fe4944c release version 0.2.3 2017-01-20 05:47:06 +01:00
1513ab3421 release version 0.2.2 2017-01-20 05:23:40 +01:00
68d67508b4 setup.py: bump python version to 3.5 2017-01-20 05:21:27 +01:00
4040e907a3 Make --help output more friendly to help2man 2017-01-20 05:17:23 +01:00
ef36e86a46 Fix default of --producer argument in --help output to include version 2017-01-20 04:46:26 +01:00
e4b6854eb5 src/tests/output/mono.png.pdf: remove the /Producer field 2017-01-20 04:45:30 +01:00
fbeecd2301 Allow to not add the /Producer field to the metadata 2017-01-20 04:45:14 +01:00
f94684c8ea If input to convert() doesn't have a write() member, then it must be str or bytes type 2017-01-19 20:32:46 +01:00
7c90b2c451 restore backwards compatibility and allow passing images as a list 2017-01-19 20:32:16 +01:00
962d4dbdfb Make pep8 compliant again 2017-01-19 11:22:08 +01:00
c2f43a579e src/tests/output/mono.png.pdf: remove timestamps 2017-01-19 09:02:16 +01:00
James R. Barlow
eedf73e73e Add test cases for CCITT G4 2016-07-22 07:09:35 +02:00
James R. Barlow
c27505486e Break out TIFF G4 transcoding into its own function 2016-07-22 07:09:24 +02:00
James R. Barlow
95f84f97bc Implement automatic monochrome CCITT Group4 encoding via Pillow/libtiff 2016-07-22 07:09:07 +02:00
Jakub Wilk
0fe30a62d9 fix typos 2016-06-10 17:02:38 +01:00
e05d2992e0 src/tests/__init__.py: adapt testcases for last commit 2016-05-26 18:11:42 +02:00
d78b2cbdbc Fix bug where horizontal and vertical borders where switched when page size was also given. Thanks Erik./a.out| sox -c 1 -e unsigned-integer -r 8000 -t u8 - -d --buffer 32 Closes 2016-05-26 08:19:34 +02:00
josch
c29df64cd8 Merge branch 'update-version' into 'master'
update version according to release



See merge request 
2016-05-25 17:43:32 +02:00
Andrey Gursky
c704e657cc update version according to release 2016-05-11 21:39:21 +02:00
c3e1662649 release version 0.2.1 2016-05-04 23:00:08 +02:00
c158585618 only use jp2 to parse jpeg2000 if PIL doesn't support jpeg2000 2016-02-17 20:31:46 +01:00
e5b0ffee6c read horizontal and vertical dpi from jpeg2000 files 2016-02-17 20:30:58 +01:00
ef9eac7326 instead of storing a limited list of supported colorspaces and imageformats, store the items requiring special treatment but do not abort if a value other than the listed ones is supported by PIL 2016-02-17 18:47:02 +01:00
00c73a6691 make the separating x optional when specifying the width without the height 2016-02-17 18:16:17 +01:00
86552a78dc fix --fit=exact if one value is missing 2016-02-17 18:11:11 +01:00
ad528d4feb restore feature that it is allowed to only specify width or height 2016-02-17 17:51:57 +01:00
3e8894c7da change MonochromeJpegError to JpegColorspaceError and add check for jpegs with alpha channel 2016-02-17 00:37:51 +01:00
d9b2550ceb support for palette images for better GIF support 2016-02-17 00:37:51 +01:00
edba669000 support from multi-frame images like multipage TIFF and animated GIF 2016-02-17 00:37:46 +01:00
dbed55655a fix JPEG2000 handling after move to Python3 2016-02-17 00:28:19 +01:00
27015c1a6e fix testsuite after exception renaming 2016-02-17 00:26:45 +01:00
c6d69b231b when verbose logging is enabled, print a traceback of possibly raised exceptions 2016-02-14 18:55:27 +01:00
b6e969d267 input images can now be file like objects, binary strings with the image content or filenames 2016-02-14 18:54:59 +01:00
cc59fbc3e4 raise exceptions instead of doing an exit() 2016-02-13 09:31:33 +01:00
James R. Barlow
900346d430 src/img2pdf.py: add get_fixed_dpi_layout_fun 2016-02-13 09:01:08 +01:00
James R. Barlow
a7dba2894f Put version string in /Producer too 2016-02-13 09:01:03 +01:00
James R. Barlow
11affb6264 allow writing the pdf to a file-like object instead of storing everything in memory 2016-02-13 09:00:53 +01:00
31a2ce231f cap the maximum page size at 200 dpi which is the maximum supported by PDF 2016-02-10 09:44:12 +01:00
7160f1628d fix parsing of dpi and % units 2016-02-10 09:32:38 +01:00
917c173bfd set img2pdf as default /producer value 2016-02-10 07:25:55 +01:00
josch
251238b410 release version 0.2.0 - big rewrite and lots of new features
- now Python3 only
 - pep8 compliant code
 - update my email to josch@mister-muffin.de
 - move from github to gitlab.mister-muffin.de/josch/img2pdf
 - use logging module
 - add extensive test suite
 - ability to read from standard input
 - pdf writer:
      - make more compatible with the interface of pdfrw module
      - print floats which equal to their integer conversion as integer
      - do not print trailing zeroes for floating point numbers
      - print more linebreaks
      - add binary string at beginning of PDF to indicate that the PDF
        contains binary data
      - handle datetime and unicode strings by using utf-16-be encoding
 - new options (see --help for more details):
      - --without-pdfrw
      - --imgsize
      - --border
      - --fit
      - --auto-orient
      - --viewer-panes
      - --viewer-initial-page
      - --viewer-magnification
      - --viewer-page-layout
      - --viewer-fit-window
      - --viewer-center-window
      - --viewer-fullscreen
 - remove short command line options for metadata arguments
2016-02-02 19:51:27 +01:00
josch
02fad70e4b README.md: add another feature idea 2015-05-10 14:24:12 +02:00
33 changed files with 12770 additions and 851 deletions

3
.gitignore vendored
View file

@ -1,3 +1,6 @@
*.pyc
build
src/*.egg-info
.eggs
.tox

3
.mailmap Normal file
View file

@ -0,0 +1,3 @@
Johannes Schauer Marin Rodrigues <josch@mister-muffin.de>
Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> <j.schauer@email.de>
Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> <josch@pyneo.org>

42
.travis.yml Normal file
View file

@ -0,0 +1,42 @@
language: python
matrix:
include:
- name: "Ubuntu Focal"
dist: focal
addons:
apt:
packages:
- imagemagick
- libtiff-tools
- libimage-exiftool-perl
- poppler-utils
- netpbm
- ghostscript
- mupdf-tools
- name: "python 3.9 Windows"
os: windows
language: shell # 'language: python' is an error on Travis CI Windows
before_install: choco install python imagemagick
env: PATH=/c/Python39:/c/Python39/Scripts:$PATH
- name: "python 3.7 MacOs"
os: osx
osx_image: xcode12.2 # pikepdf import fails with earlier versions
language: shell # 'language: python' is an error on Travis CI macOS
cache:
directories:
- "$HOME/Library/Caches/Homebrew"
- "$HOME/Library/Caches/pip"
addons:
homebrew:
#update: true
packages:
- python3
- imagemagick
before_install:
- python3 -m pip install --upgrade virtualenv
- virtualenv -p python3 --system-site-packages "$HOME/venv"
- source "$HOME/venv/bin/activate"
install: pip install tox
script:
- python --version
- python -m tox

View file

@ -2,49 +2,223 @@
CHANGES
=======
0.1.6
-----
0.6.1 (2025-04-27)
------------------
- replace -x and -y option by combined option -s (or --pagesize) and use -S
for --subject
- testsuite fixes
0.6.0 (2025-02-15)
------------------
- Add support for JBIG2 (generic coding)
- Add convert_to_docobject() broken out from convert()
- Add pil_get_dpi() broken out from get_imgmetadata()
0.5.1 (2023-11-26)
------------------
- no default ICC profile location for PDF/A-1b on Windows
- workaround for PNG input without dpi units but non-square dpi aspect ratio
0.5.0 (2023-10-28)
------------------
- support MIFF for 16 bit CMYK input
- accept pathlib.Path objects as input
- don't store RGB ICC profiles from bilevel or grayscale TIFF, PNG and JPEG
- thumbnails are no longer included by default and --include-thumbnails has to
be used if you want them
- support for pikepdf (>= 6.2.0)
0.4.4 (2022-04-07)
------------------
- --viewer-page-layout support for twopageright and twopageleft
- Add B and JB paper sizes
- support for pikepdf (>= 5.0.0) and Pillow (>= 9.1.0)
0.4.3 (2021-10-24)
------------------
- fix --viewer-initial-page (broken in last release)
0.4.2 (2021-10-11)
------------------
- add --rotation
- allow palette PNG images with ICC profile
- sort globbing result on windows
- convert 8-bit PNG alpha channels to /SMasks in PDF
- remove pdfrw from tests
0.4.1 (2021-05-09)
------------------
- support wildcards in paths on windows
- support MPO images
- fix page border computation
- use "img2pdf" logger instead of "root" logger
- add --from-file
0.4.0 (2020-08-07)
------------------
- replace --without-pdfrw by --engine=internal or --engine=pdfrw
- add pikepdf as additional rendering engine and add --engine=pikepdf
- support for creating PDF/A-1b compliant PDF using the --pdfa option
(this also requires the presence of an ICC profile somewhere on the system)
- support for images with embedded ICC profile as input
- rewrite tests
* use pytest via tox
* use pikepdf instead of pdfrw
* use imagemagick json output instead of identify -verbose
- format all code with black
0.3.6 (2020-04-30)
------------------
- fix tests for Fedora on arm64
0.3.5 (2020-04-28)
------------------
- remove all Python 2 support
- disable pdfrw by default
0.3.4 (2020-04-05)
------------------
- test.sh: replace imagemagick with custom python script to produce bit-by-bit
identical results on all architectures
- add --crop-border, --bleed-border, --trim-border and --art-border options
- first draft of a rudimentary tkinter gui (run with --gui)
0.3.3 (2019-01-07)
------------------
- restore basic support for Python 2
- also ship test.sh
- add legal and tabloid paper formats
- respect exif rotation tag
0.3.2 (2018-11-20)
------------------
- support big endian TIFF with lsb-to-msb FillOrder
- support multipage CCITT Group 4 TIFF
- also reject palette images with transparency
- support PNG images with 1, 2, 4 or 16 bits per sample
- support multipage TIFF with differently encoded images
- support CCITT Group4 TIFF without rows-per-strip
- add extensive test suite
0.3.1 (2018-08-04)
------------------
- Directly copy data from CCITT Group 4 encoded TIFF images into the PDF
container without re-encoding
0.3.0 (2018-06-18)
------------------
- Store non-jpeg images using PNG compression
- Support arbitrarily large pages via PDF /UserUnit field
- Disallow input with alpha channel as it cannot be preserved
- Add option --pillow-limit-break to support very large input
0.2.4 (2017-05-23)
------------------
- Restore support for Python 2.7
- Add support for PyPy
- Add support for testing using tox
0.2.3 (2017-01-20)
------------------
- version number bump for botched pypi upload...
0.2.2 (2017-01-20)
------------------
- automatic monochrome CCITT Group4 encoding via Pillow/libtiff
0.2.1 (2016-05-04)
------------------
- set img2pdf as /producer value
- support multi-frame images like multipage TIFF and animated GIF
- support for palette images like GIF
- support all colorspaces and imageformats known by PIL
- read horizontal and vertical dpi from JPEG2000 files
0.2.0 (2015-05-10)
------------------
- now Python3 only
- pep8 compliant code
- update my email to josch@mister-muffin.de
- move from github to gitlab.mister-muffin.de/josch/img2pdf
- use logging module
- add extensive test suite
- ability to read from standard input
- pdf writer:
- make more compatible with the interface of pdfrw module
- print floats which equal to their integer conversion as integer
- do not print trailing zeroes for floating point numbers
- print more linebreaks
- add binary string at beginning of PDF to indicate that the PDF
contains binary data
- handle datetime and unicode strings by using utf-16-be encoding
- new options (see --help for more details):
- --without-pdfrw
- --imgsize
- --border
- --fit
- --auto-orient
- --viewer-panes
- --viewer-initial-page
- --viewer-magnification
- --viewer-page-layout
- --viewer-fit-window
- --viewer-center-window
- --viewer-fullscreen
- remove short options for metadata command line arguments
- correctly encode and escape non-ascii metadata
- explicitly store date in UTC and allow parsing all date formats understood
by dateutil and `date --date`
0.1.5
-----
0.1.5 (2015-02-16)
------------------
- Enable support for CMYK images
- Rework test suite
- support file objects as input
0.1.4
-----
0.1.4 (2015-01-21)
------------------
- add Python 3 support
- make output reproducible by sorting and --nodate option
0.1.3
-----
0.1.3 (2014-11-10)
------------------
- Avoid leaking file descriptors
- Convert unrecognized colorspaces to RGB
0.1.1
-----
0.1.1 (2014-09-07)
------------------
- allow running src/img2pdf.py standalone
- license change from GPL to LGPL
- Add pillow 2.4.0 support
- add options to specify pdf dimensions in points
0.1.0 (unreleased)
0.1.0 (2014-03-14, unreleased)
------------------
- Initial PyPI release.
- Modified code to create proper package.
- Added tests.
- Added console script entry point.

83
HACKING Normal file
View file

@ -0,0 +1,83 @@
Running img2pdf from source
---------------------------
img2pdf can be run directly from the cloned git repository:
$ python3 src/img2pdf.py img.jpg -o out.pdf
Running the testsuite
---------------------
$ pytest
Making a new release
--------------------
- CHANGES.rst: Add a new entry
- setup.py: Bump VERSION
- src/img2pdf.py: Bump __version__
- Commit:
$ git add CHANGES.rst setup.py src/img2pdf.py
$ git commit -m "release version X.Y.Z"
- Add git tag:
$ git tag X.Y.Z -m X.Y.Z
- Build and upload to pypi:
$ rm -rf dist/*
$ python3 setup.py sdist
$ twine upload dist/*
- Push everything to git forge
$ git push
- Push to github
$ git push github
- Obtain img2pdf.exe from appveyor:
https://ci.appveyor.com/project/josch/img2pdf/
- Create new release:
https://gitlab.mister-muffin.de/josch/img2pdf/releases/new
Using debbisect to find regressions
-----------------------------------
$ debbisect --cache=./cache --depends="git,ca-certificates,python3,
ghostscript,imagemagick,mupdf-tools,poppler-utils,python3-pil,
python3-pytest,python3-numpy,python3-scipy,python3-pikepdf" \
--verbose 2023-09-16 2023-10-24 \
'chroot "$1" sh -c "
git clone https://gitlab.mister-muffin.de/josch/img2pdf.git
&& cd img2pdf
&& pytest 'src/img2pdf_test.py::test_jpg_2000_rgba8[internal]"'
Using debbisect cache
---------------------
$ mmdebstrap --variant=apt --aptopt='Acquire::Check-Valid-Until "false"' \
--include=git,ca-certificates,python3,ghostscript,imagemagick \
--include=mupdf-tools,poppler-utils,python3-pil,python3-pytest \
--include=python3-numpy,python3-scipy,python3-pikepdf \
--hook-dir=/usr/share/mmdebstrap/hooks/file-mirror-automount \
--setup-hook='mkdir -p "$1/home/josch/git/devscripts/cache/pool/"' \
--setup-hook='mount -o ro,bind /home/josch/git/devscripts/cache/pool/ "$1/home/josch/git/devscripts/cache/pool/"' \
--chrooted-customize-hook=bash
unstable /dev/null
file:///home/josch/git/devscripts/cache/archive/debian/20231022T090139Z/
Bisecting imagemagick
---------------------
$ git clean -fdx && git reset --hard
$ ./configure --prefix=$(pwd)/prefix
$ make -j$(nproc)
$ make install
$ LD_LIBRARY_PATH=$(pwd)/prefix/lib prefix/bin/compare ...

165
LICENSE Normal file
View file

@ -0,0 +1,165 @@
GNU LESSER GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
This version of the GNU Lesser General Public License incorporates
the terms and conditions of version 3 of the GNU General Public
License, supplemented by the additional permissions listed below.
0. Additional Definitions.
As used herein, "this License" refers to version 3 of the GNU Lesser
General Public License, and the "GNU GPL" refers to version 3 of the GNU
General Public License.
"The Library" refers to a covered work governed by this License,
other than an Application or a Combined Work as defined below.
An "Application" is any work that makes use of an interface provided
by the Library, but which is not otherwise based on the Library.
Defining a subclass of a class defined by the Library is deemed a mode
of using an interface provided by the Library.
A "Combined Work" is a work produced by combining or linking an
Application with the Library. The particular version of the Library
with which the Combined Work was made is also called the "Linked
Version".
The "Minimal Corresponding Source" for a Combined Work means the
Corresponding Source for the Combined Work, excluding any source code
for portions of the Combined Work that, considered in isolation, are
based on the Application, and not on the Linked Version.
The "Corresponding Application Code" for a Combined Work means the
object code and/or source code for the Application, including any data
and utility programs needed for reproducing the Combined Work from the
Application, but excluding the System Libraries of the Combined Work.
1. Exception to Section 3 of the GNU GPL.
You may convey a covered work under sections 3 and 4 of this License
without being bound by section 3 of the GNU GPL.
2. Conveying Modified Versions.
If you modify a copy of the Library, and, in your modifications, a
facility refers to a function or data to be supplied by an Application
that uses the facility (other than as an argument passed when the
facility is invoked), then you may convey a copy of the modified
version:
a) under this License, provided that you make a good faith effort to
ensure that, in the event an Application does not supply the
function or data, the facility still operates, and performs
whatever part of its purpose remains meaningful, or
b) under the GNU GPL, with none of the additional permissions of
this License applicable to that copy.
3. Object Code Incorporating Material from Library Header Files.
The object code form of an Application may incorporate material from
a header file that is part of the Library. You may convey such object
code under terms of your choice, provided that, if the incorporated
material is not limited to numerical parameters, data structure
layouts and accessors, or small macros, inline functions and templates
(ten or fewer lines in length), you do both of the following:
a) Give prominent notice with each copy of the object code that the
Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the object code with a copy of the GNU GPL and this license
document.
4. Combined Works.
You may convey a Combined Work under terms of your choice that,
taken together, effectively do not restrict modification of the
portions of the Library contained in the Combined Work and reverse
engineering for debugging such modifications, if you also do each of
the following:
a) Give prominent notice with each copy of the Combined Work that
the Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the Combined Work with a copy of the GNU GPL and this license
document.
c) For a Combined Work that displays copyright notices during
execution, include the copyright notice for the Library among
these notices, as well as a reference directing the user to the
copies of the GNU GPL and this license document.
d) Do one of the following:
0) Convey the Minimal Corresponding Source under the terms of this
License, and the Corresponding Application Code in a form
suitable for, and under terms that permit, the user to
recombine or relink the Application with a modified version of
the Linked Version to produce a modified Combined Work, in the
manner specified by section 6 of the GNU GPL for conveying
Corresponding Source.
1) Use a suitable shared library mechanism for linking with the
Library. A suitable mechanism is one that (a) uses at run time
a copy of the Library already present on the user's computer
system, and (b) will operate properly with a modified version
of the Library that is interface-compatible with the Linked
Version.
e) Provide Installation Information, but only if you would otherwise
be required to provide such information under section 6 of the
GNU GPL, and only to the extent that such information is
necessary to install and execute a modified version of the
Combined Work produced by recombining or relinking the
Application with a modified version of the Linked Version. (If
you use option 4d0, the Installation Information must accompany
the Minimal Corresponding Source and Corresponding Application
Code. If you use option 4d1, you must provide the Installation
Information in the manner specified by section 6 of the GNU GPL
for conveying Corresponding Source.)
5. Combined Libraries.
You may place library facilities that are a work based on the
Library side by side in a single library together with other library
facilities that are not Applications and are not covered by this
License, and convey such a combined library under terms of your
choice, if you do both of the following:
a) Accompany the combined library with a copy of the same work based
on the Library, uncombined with any other library facilities,
conveyed under the terms of this License.
b) Give prominent notice with the combined library that part of it
is a work based on the Library, and explaining where to find the
accompanying uncombined form of the same work.
6. Revised Versions of the GNU Lesser General Public License.
The Free Software Foundation may publish revised and/or new versions
of the GNU Lesser General Public License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the
Library as you received it specifies that a certain numbered version
of the GNU Lesser General Public License "or any later version"
applies to it, you have the option of following the terms and
conditions either of that published version or of any later version
published by the Free Software Foundation. If the Library as you
received it does not specify a version number of the GNU Lesser
General Public License, you may choose any version of the GNU Lesser
General Public License ever published by the Free Software Foundation.
If the Library as you received it specifies that a proxy can decide
whether future versions of the GNU Lesser General Public License shall
apply, that proxy's public statement of acceptance of any version is
permanent authorization for you to choose that version for the
Library.

View file

@ -1,6 +1,12 @@
include README.md
include test_comp.sh
include test.sh
include magick.py
include CHANGES.rst
include LICENSE
recursive-include src *.jpg
recursive-include src *.pdf
recursive-include src *.png
recursive-include src *.tif
recursive-include src *.gif
recursive-include src *.py

414
README.md
View file

@ -1,191 +1,329 @@
[![Travis Status](https://travis-ci.com/josch/img2pdf.svg?branch=main)](https://app.travis-ci.com/josch/img2pdf)
[![Appveyor Status](https://ci.appveyor.com/api/projects/status/2kws3wkqvi526llj/branch/main?svg=true)](https://ci.appveyor.com/project/josch/img2pdf/branch/main)
img2pdf
=======
Losslessly convert images to PDF without unnecessarily re-encoding JPEG and
JPEG2000 files. Image quality is retained without unnecessarily increasing
file size.
Lossless conversion of raster images to PDF. You should use img2pdf if your
priorities are (in this order):
Background
----------
1. **always lossless**: the image embedded in the PDF will always have the
exact same color information for every pixel as the input
2. **small**: if possible, the difference in filesize between the input image
and the output PDF will only be the overhead of the PDF container itself
3. **fast**: if possible, the input image is just pasted into the PDF document
as-is without any CPU hungry re-encoding of the pixel data
Quality loss can be avoided when converting JPEG and JPEG2000 images to
PDF by embedding them without re-encoding. I wrote this piece of python code.
because I was missing a tool to do this automatically.
Conventional conversion software (like ImageMagick) would either:
If you know how to embed JPEG and JPEG2000 images into a PDF container without
recompression, using existing tools, please contact me so that I can put this
code into the garbage bin :D
1. not be lossless because lossy re-encoding to JPEG
2. not be small because using wasteful flate encoding of raw pixel data
3. not be fast because input data gets re-encoded
Functionality
-------------
Another advantage of not having to re-encode the input (in most common
situations) is, that img2pdf is able to handle much larger input than other
software, because the raw pixel data never has to be loaded into memory.
This program will take a list of images and produce a PDF file with the images
embedded in it. JPEG and JPEG2000 images will be included without
recompression. Images in other formats will be included with zip/flate
encoding which usually leads to an increase in the resulting size because
formats like png compress better than PDF which just zip/flate compresses the
RGB data. As a result, this tool is able to losslessly wrap images into a PDF
container with a quality-filesize ratio that is typically better (in case of
JPEG and JPEG2000 images) or equal (in case of other formats) than that of
existing tools.
The following table shows how img2pdf handles different input depending on the
input file format and image color space.
For example, imagemagick will re-encode the input JPEG image (thus changing
its content):
| Format | Colorspace | Result |
| ------------------------------------- | ------------------------------------ | ------------- |
| JPEG | any | direct |
| JPEG2000 | any | direct |
| PNG (non-interlaced, no transparency) | any | direct |
| TIFF (CCITT Group 4) | 1-bit monochrome | direct |
| JBIG2 (single-page generic coding) | 1-bit monochrome | direct |
| any | any except CMYK and 1-bit monochrome | PNG Paeth |
| any | 1-bit monochrome | CCITT Group 4 |
| any | CMYK | flate |
$ convert img.jpg img.pdf
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
$ compare -metric AE img.jpg img.extr-000.ppm null:
1.6301e+06
If one wants to losslessly convert from any format to PDF with
imagemagick, one has to use zip compression:
$ convert input.jpg -compress Zip output.pdf
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
$ compare -metric AE img.jpg img.extr-000.ppm null:
0
However, this approach will result in PDF files that are a few times larger
than the input JPEG or JPEG2000 file.
img2pdf is able to losslessly embed JPEG and JPEG2000 files into a PDF
container without additional overhead (aside from the PDF structure itself),
save other graphics formats using lossless zip compression,
and produce multi-page PDF files when more than one input image is given.
Also, since JPEG and JPEG2000 images are not reencoded, conversion with
img2pdf is several times faster than with other tools.
For JPEG, JPEG2000, non-interlaced PNG, TIFF images with CCITT Group 4
encoded data, and JBIG2 with single-page generic coding (e.g. using `jbig2enc`),
img2pdf directly embeds the image data into the PDF without
re-encoding it. It thus treats the PDF format merely as a container format for
the image data. In these cases, img2pdf only increases the filesize by the size
of the PDF container (typically around 500 to 700 bytes). Since data is only
copied and not re-encoded, img2pdf is also typically faster than other
solutions for these input formats.
For all other input types, img2pdf first has to transform the pixel data to
make it compatible with PDF. In most cases, the PNG Paeth filter is applied to
the pixel data. For 1-bit monochrome input, CCITT Group 4 is used instead. Only for
CMYK input no filter is applied before finally applying flate compression.
Usage
-----
#### General Notes
The images must be provided as files because img2pdf needs to seek in the file
descriptor.
The images must be provided as files because img2pdf needs to seek
in the file descriptor. Input cannot be piped through stdin.
If no output file is specified with the `-o`/`--output` option, output will be
done to stdout. A typical invocation is:
If no output file is specified with the `-o`/`--output` option,
output will be to stdout.
$ img2pdf img1.png img2.jpg -o out.pdf
Descriptions of the options should be self explanatory.
They are available by running:
The detailed documentation can be accessed by running:
img2pdf --help
$ img2pdf --help
With no command line arguments supplied, img2pdf will read a single image from
standard input and write the resulting PDF to standard output. Here is an
example for how to scan directly to PDF using scanimage(1) from SANE:
#### Controlling Page Size
The PDF page size can be manipulated. By default, the image will be sized "into" the given dimensions with the aspect ratio retained. For instance, to size an image into a page that is at most 500pt x 500pt, use:
img2pdf -s 500x500 -o output.pdf input.jpg
To "fill" out a page that is at least 500pt x 500pt, follow the dimensions with a `^`:
img2pdf -s 500x500^ -o output.pdf input.jpg
To output pages that are exactly 500pt x 500pt, follow the dimensions with an `!`:
img2pdf -s 500x500\! -o output.pdf input.jpg
Notice that the default unit is points. Units may be also be specified and mixed:
img2pdf -s 8.5inx27.94cm -o output.pdf input.jpg
If either width or height is omitted, the other will be calculated
to preserve aspect ratio.
img2pdf -s x280mm -o output1.pdf input.jpg
img2pdf -s 280mmx -o output2.pdf input.jpg
Some standard page sizes are recognized:
img2pdf -s letter -o output1.pdf input.jpg
img2pdf -s a4 -o output2.pdf input.jpg
#### Colorspace
Currently, the colorspace must be forced for JPEG 2000 images that are
not in the RGB colorspace. Available colorspace options are based on
Python Imaging Library (PIL) short handles.
* `RGB` = RGB color
* `L` = Grayscale
* `1` = Black and white (internally converted to grayscale)
* `CMYK` = CMYK color
* `CMYK;I` = CMYK color with inversion
For example, to encode a grayscale JPEG2000 image, use:
img2pdf -C L -o output.pdf input.jp2
$ scanimage --mode=Color --resolution=300 | pnmtojpeg -quality 90 | img2pdf > scan.pdf
Bugs
----
If you find a JPEG or JPEG2000 file that, when embedded cannot be read
by the Adobe Acrobat Reader, please contact me.
- If you find a JPEG, JPEG2000, PNG or CCITT Group 4 encoded TIFF file that,
when embedded into the PDF cannot be read by the Adobe Acrobat Reader,
please contact me.
For lossless conversion of formats other than JPEG or JPEG2000, zip/flate
encoding is used. This choice is based on tests I did with a number of images.
I converted them into PDF using the lossless variants of the compression
formats offered by imagemagick. In all my tests, zip/flate encoding performed
best. You can verify my findings using the test_comp.sh script with any input
image given as a commandline argument. If you find an input file that is
outperformed by another lossless compression method, contact me.
- An error is produced if the input image is broken. This commonly happens if
the input image has an invalid EXIF Orientation value of zero. Even though
only nine different values from 1 to 9 are permitted, Anroid phones and
Canon DSLR cameras produce JPEG images with the invalid value of zero.
Either fix your input images with `exiftool` or similar software before
passing the JPEG to `img2pdf` or run `img2pdf` with `--rotation=ifvalid`
(if you run img2pdf from the commandline) or by passing
`rotation=img2pdf.Rotation.ifvalid` as an argument to `convert()` when using
img2pdf as a library.
I have not yet figured out how to determine the colorspace of JPEG2000 files.
Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 files with
other colorspaces, you must force it using the `--colorspace` option.
It might be possible to store transparency using masks but it is not clear
what the utility of such a functionality would be.
Most vector graphic formats can be losslessly turned into PDF (minus some of
the features unsupported by PDF) but img2pdf will currently turn vector
graphics into their lossy raster representations.
Acrobat is able to store a hint for the PDF reader of how to present the PDF
when opening it. Things like automatic fullscreen or the zoom level can be
configured.
It would be nice if a single input image could be read from standard input.
- img2pdf uses PIL (or Pillow) to obtain image meta data and to convert the
input if necessary. To prevent decompression bomb denial of service attacks,
Pillow limits the maximum number of pixels an input image is allowed to
have. If you are sure that you know what you are doing, then you can disable
this safeguard by passing the `--pillow-limit-break` option to img2pdf. This
allows one to process even very large input images.
Installation
------------
On a Debian- and Ubuntu-based systems, dependencies may be installed
with the following command:
On a Debian- and Ubuntu-based systems, img2pdf can be installed from the
official repositories:
apt-get install python python-pil python-setuptools
$ apt install img2pdf
Or for Python 3:
If you want to install it using pip, you can run:
apt-get install python3 python3-pil python3-setuptools
You can then install the package using:
$ pip install img2pdf
$ pip3 install img2pdf
If you prefer to install from source code use:
$ cd img2pdf/
$ pip install .
$ pip3 install .
To test the console script without installing the package on your system,
use virtualenv:
$ cd img2pdf/
$ virtualenv ve
$ ve/bin/pip install .
$ ve/bin/pip3 install .
You can then test the converter using:
$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
If you don't want to setup Python on Windows, then head to the
[releases](https://gitlab.mister-muffin.de/josch/img2pdf/releases) section and download the latest
`img2pdf.exe`.
GUI
---
There exists an experimental GUI with all settings currently disabled. You can
directly convert images to PDF but you cannot set any options via the GUI yet.
If you are interested in adding more features to the PDF, please submit a merge
request. The GUI is based on tkinter and works on Linux, Windows and MacOS.
![](screenshot.png)
Library
-------
The package can also be used as a library:
import img2pdf
pdf_bytes = img2pdf.convert(['test.jpg'])
file = open("name.pdf","wb")
file.write(pdf_bytes)
# opening from filename
with open("name.pdf","wb") as f:
f.write(img2pdf.convert('test.jpg'))
# opening from file handle
with open("name.pdf","wb") as f1, open("test.jpg") as f2:
f1.write(img2pdf.convert(f2))
# opening using pathlib
with open("name.pdf","wb") as f:
f.write(img2pdf.convert(pathlib.Path('test.jpg')))
# using in-memory image data
with open("name.pdf","wb") as f:
f.write(img2pdf.convert("\x89PNG...")
# multiple inputs (variant 1)
with open("name.pdf","wb") as f:
f.write(img2pdf.convert("test1.jpg", "test2.png"))
# multiple inputs (variant 2)
with open("name.pdf","wb") as f:
f.write(img2pdf.convert(["test1.jpg", "test2.png"]))
# convert all files ending in .jpg inside a directory
dirname = "/path/to/images"
imgs = []
for fname in os.listdir(dirname):
if not fname.endswith(".jpg"):
continue
path = os.path.join(dirname, fname)
if os.path.isdir(path):
continue
imgs.append(path)
with open("name.pdf","wb") as f:
f.write(img2pdf.convert(imgs))
# convert all files ending in .jpg in a directory and its subdirectories
dirname = "/path/to/images"
imgs = []
for r, _, f in os.walk(dirname):
for fname in f:
if not fname.endswith(".jpg"):
continue
imgs.append(os.path.join(r, fname))
with open("name.pdf","wb") as f:
f.write(img2pdf.convert(imgs))
# convert all files matching a glob
import glob
with open("name.pdf","wb") as f:
f.write(img2pdf.convert(glob.glob("/path/to/*.jpg")))
# convert all files matching a glob using pathlib.Path
from pathlib import Path
with open("name.pdf","wb") as f:
f.write(img2pdf.convert(*Path("/path").glob("**/*.jpg")))
# ignore invalid rotation values in the input images
with open("name.pdf","wb") as f:
f.write(img2pdf.convert('test.jpg'), rotation=img2pdf.Rotation.ifvalid)
# writing to file descriptor
with open("name.pdf","wb") as f1, open("test.jpg") as f2:
img2pdf.convert(f2, outputstream=f1)
# specify paper size (A4)
a4inpt = (img2pdf.mm_to_pt(210),img2pdf.mm_to_pt(297))
layout_fun = img2pdf.get_layout_fun(a4inpt)
with open("name.pdf","wb") as f:
f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
# use a fixed dpi of 300 instead of reading it from the image
dpix = dpiy = 300
layout_fun = img2pdf.get_fixed_dpi_layout_fun((dpix, dpiy))
with open("name.pdf","wb") as f:
f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
# create a PDF/A-1b compliant document by passing an ICC profile
with open("name.pdf","wb") as f:
f.write(img2pdf.convert('test.jpg', pdfa="/usr/share/color/icc/sRGB.icc"))
Comparison to ImageMagick
-------------------------
Create a large test image:
$ convert logo: -resize 8000x original.jpg
Convert it into PDF using ImageMagick and img2pdf:
$ time img2pdf original.jpg -o img2pdf.pdf
$ time convert original.jpg imagemagick.pdf
Notice how ImageMagick took an order of magnitude longer to do the conversion
than img2pdf. It also used twice the memory.
Now extract the image data from both PDF documents and compare it to the
original:
$ pdfimages -all img2pdf.pdf tmp
$ compare -metric AE original.jpg tmp-000.jpg null:
0
$ pdfimages -all imagemagick.pdf tmp
$ compare -metric AE original.jpg tmp-000.jpg null:
118716
To get lossless output with ImageMagick we can use Zip compression but that
unnecessarily increases the size of the output:
$ convert original.jpg -compress Zip imagemagick.pdf
$ pdfimages -all imagemagick.pdf tmp
$ compare -metric AE original.jpg tmp-000.png null:
0
$ stat --format="%s %n" original.jpg img2pdf.pdf imagemagick.pdf
1535837 original.jpg
1536683 img2pdf.pdf
9397809 imagemagick.pdf
Comparison to pdfLaTeX
----------------------
pdfLaTeX performs a lossless conversion from included images to PDF by default.
If the input is a JPEG, then it simply embeds the JPEG into the PDF in the same
way as img2pdf does it. But for other image formats it uses flate compression
of the plain pixel data and thus needlessly increases the output file size:
$ convert logo: -resize 8000x original.png
$ cat << END > pdflatex.tex
\documentclass{article}
\usepackage{graphicx}
\begin{document}
\includegraphics{original.png}
\end{document}
END
$ pdflatex pdflatex.tex
$ stat --format="%s %n" original.png pdflatex.pdf
4500182 original.png
9318120 pdflatex.pdf
Comparison to podofoimg2pdf
---------------------------
Like pdfLaTeX, podofoimg2pdf is able to perform a lossless conversion from JPEG
to PDF by plainly embedding the JPEG data into the pdf container. But just like
pdfLaTeX it uses flate compression for all other file formats, thus sometimes
resulting in larger files than necessary.
$ convert logo: -resize 8000x original.png
$ podofoimg2pdf out.pdf original.png
stat --format="%s %n" original.png out.pdf
4500181 original.png
9335629 out.pdf
It also only supports JPEG, PNG and TIF as input and lacks many of the
convenience features of img2pdf like page sizes, borders, rotation and
metadata.
Comparison to Tesseract OCR
---------------------------
Tesseract OCR comes closest to the functionality img2pdf provides. It is able
to convert JPEG and PNG input to PDF without needlessly increasing the filesize
and is at the same time lossless. So if your input is JPEG and PNG images, then
you should safely be able to use Tesseract instead of img2pdf. For other input,
Tesseract might not do a lossless conversion. For example it converts CMYK
input to RGB and removes the alpha channel from images with transparency. For
multipage TIFF or animated GIF, it will only convert the first frame.
Comparison to econvert from ExactImage
--------------------------------------
Like pdflatex and podofoimg2pf, econvert is able to embed JPEG images into PDF
directly without re-encoding but when given other file formats, it stores them
just using flate compressen, which unnecessarily increases the filesize.
Furthermore, it throws an error with CMYK TIF input. It also doesn't store CMYK
jpeg files as CMYK but converts them to RGB, so it's not lossless. When trying
to feed it 16bit files, it errors out with Unhandled bps/spp combination. It
also seems to choose JPEG encoding when using it on some file types (like
palette images) making it again not lossless for that input as well.

33
appveyor.yml Normal file
View file

@ -0,0 +1,33 @@
environment:
# For Python versions available on Appveyor, see
# https://www.appveyor.com/docs/windows-images-software/#python
matrix:
# - PYTHON: "C:\\Python27"
# - PYTHON: "C:\\Python33"
# - PYTHON: "C:\\Python34"
# - PYTHON: "C:\\Python35"
# - PYTHON: "C:\\Python36"
# - PYTHON: "C:\\Python37"
# - PYTHON: "C:\\Python27-x64"
# - PYTHON: "C:\\Python33-x64"
# - PYTHON: "C:\\Python34-x64"
# - PYTHON: "C:\\Python35-x64"
# - PYTHON: "C:\\Python36-x64"
- PYTHON: "C:\\Python37-x64"
install:
- "%PYTHON%\\python.exe -m pip install tox wheel pyinstaller Pillow"
build: off
# don't run tests on windows because we don't have imagemagick
#test_script:
# - "%PYTHON%\\python.exe -m tox"
after_test:
- "%PYTHON%\\python.exe setup.py bdist_wheel"
- "%PYTHON%\\python.exe -m PyInstaller --clean --onefile --console --nowindowed --name img2pdf src/img2pdf.py"
#- "%PYTHON%\\python.exe -m PyInstaller --clean --onefile --noconsole --windowed --name img2pdf_windowed src/img2pdf.py"
artifacts:
- path: dist\*

BIN
screenshot.png Normal file

Binary file not shown.

After

(image error) Size: 6.7 KiB

View file

@ -1,2 +0,0 @@
[metadata]
description-file = README.md

View file

@ -1,42 +1,51 @@
import sys
from setuptools import setup
VERSION="0.1.6~git"
VERSION = "0.6.1"
setup (
name='img2pdf',
INSTALL_REQUIRES = (
"Pillow",
"pikepdf",
)
setup(
name="img2pdf",
version=VERSION,
author = "Johannes 'josch' Schauer",
author_email = 'j.schauer@email.de',
description = "Convert images to PDF via direct JPEG inclusion.",
long_description = open('README.md').read(),
license = "LGPL",
keywords = "jpeg pdf converter",
classifiers = [
'Development Status :: 4 - Beta',
'Intended Audience :: Developers',
'Programming Language :: Python',
'Programming Language :: Python :: 2',
'Programming Language :: Python :: 2.6',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: Implementation :: CPython',
'License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)',
'Programming Language :: Python',
'Natural Language :: English',
'Operating System :: OS Independent'],
url = 'https://github.com/josch/img2pdf',
download_url = 'https://github.com/josch/img2pdf/archive/'+VERSION+'.tar.gz',
author="Johannes Schauer Marin Rodrigues",
author_email="josch@mister-muffin.de",
description="Convert images to PDF via direct JPEG inclusion.",
long_description=open("README.md").read(),
long_description_content_type="text/markdown",
license="LGPL",
keywords="jpeg pdf converter",
classifiers=[
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"Intended Audience :: Other Audience",
"Environment :: Console",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.5",
"Programming Language :: Python :: Implementation :: CPython",
"Programming Language :: Python :: Implementation :: PyPy",
"License :: OSI Approved :: GNU Lesser General Public License v3 " "(LGPLv3)",
"Natural Language :: English",
"Operating System :: OS Independent",
],
url="https://gitlab.mister-muffin.de/josch/img2pdf",
download_url="https://gitlab.mister-muffin.de/josch/img2pdf/repository/"
"archive.tar.gz?ref=" + VERSION,
package_dir={"": "src"},
py_modules=['img2pdf', 'jp2'],
include_package_data = True,
test_suite = 'tests.test_suite',
zip_safe = True,
install_requires=(
'Pillow',
),
entry_points='''
[console_scripts]
img2pdf = img2pdf:main
''',
)
py_modules=["img2pdf", "jp2"],
include_package_data=True,
zip_safe=True,
install_requires=INSTALL_REQUIRES,
extras_require={
"gui": ("tkinter"),
},
entry_points={
"setuptools.installation": ["eggsecutable = img2pdf:main"],
"console_scripts": ["img2pdf = img2pdf:main"],
"gui_scripts": ["img2pdf-gui = img2pdf:gui"],
},
)

File diff suppressed because it is too large Load diff

7116
src/img2pdf_test.py Executable file

File diff suppressed because it is too large Load diff

View file

@ -1,6 +1,6 @@
#!/usr/bin/env python
#
# Copyright (C) 2013 Johannes 'josch' Schauer <j.schauer at email.de>
# Copyright (C) 2013 Johannes Schauer Marin Rodrigues <j.schauer at email.de>
#
# this module is heavily based upon jpylyzer which is
# KB / National Library of the Netherlands, Open Planets Foundation
@ -21,23 +21,25 @@
import struct
def getBox(data, byteStart, noBytes):
boxLengthValue = struct.unpack(">I", data[byteStart:byteStart+4])[0]
boxType = data[byteStart+4:byteStart+8]
boxLengthValue = struct.unpack(">I", data[byteStart : byteStart + 4])[0]
boxType = data[byteStart + 4 : byteStart + 8]
contentsStartOffset = 8
if boxLengthValue == 1:
boxLengthValue = struct.unpack(">Q", data[byteStart+8:byteStart+16])[0]
boxLengthValue = struct.unpack(">Q", data[byteStart + 8 : byteStart + 16])[0]
contentsStartOffset = 16
if boxLengthValue == 0:
boxLengthValue = noBytes-byteStart
boxLengthValue = noBytes - byteStart
byteEnd = byteStart + boxLengthValue
boxContents = data[byteStart+contentsStartOffset:byteEnd]
boxContents = data[byteStart + contentsStartOffset : byteEnd]
return (boxLengthValue, boxType, byteEnd, boxContents)
def parse_ihdr(data):
height = struct.unpack(">I", data[0:4])[0]
width = struct.unpack(">I", data[4:8])[0]
return width, height
height, width, channels, bpp = struct.unpack(">IIHB", data[:11])
return width, height, channels, bpp + 1
def parse_colr(data):
meth = struct.unpack(">B", data[0:1])[0]
@ -49,30 +51,60 @@ def parse_colr(data):
elif enumCS == 17:
return "L"
else:
raise Exception("only sRGB and greyscale color space is supported, got %d"%enumCS)
raise Exception(
"only sRGB and greyscale color space is supported, " "got %d" % enumCS
)
def parse_resc(data):
hnum, hden, vnum, vden, hexp, vexp = struct.unpack(">HHHHBB", data)
hdpi = ((hnum / hden) * (10**hexp) * 100) / 2.54
vdpi = ((vnum / vden) * (10**vexp) * 100) / 2.54
return hdpi, vdpi
def parse_res(data):
hdpi, vdpi = None, None
noBytes = len(data)
byteStart = 0
boxLengthValue = 1 # dummy value for while loop condition
while byteStart < noBytes and boxLengthValue != 0:
boxLengthValue, boxType, byteEnd, boxContents = getBox(data, byteStart, noBytes)
if boxType == b"resc":
hdpi, vdpi = parse_resc(boxContents)
break
return hdpi, vdpi
def parse_jp2h(data):
width, height, colorspace = None, None, None
noBytes=len(data)
byteStart=0
boxLengthValue=1 # dummy value for while loop condition
width, height, colorspace, hdpi, vdpi = None, None, None, None, None
noBytes = len(data)
byteStart = 0
boxLengthValue = 1 # dummy value for while loop condition
while byteStart < noBytes and boxLengthValue != 0:
boxLengthValue, boxType, byteEnd, boxContents = getBox(data, byteStart, noBytes)
if boxType == 'ihdr':
width, height = parse_ihdr(boxContents)
elif boxType == 'colr':
if boxType == b"ihdr":
width, height, channels, bpp = parse_ihdr(boxContents)
elif boxType == b"colr":
colorspace = parse_colr(boxContents)
elif boxType == b"res ":
hdpi, vdpi = parse_res(boxContents)
byteStart = byteEnd
return (width, height, colorspace)
return (width, height, colorspace, hdpi, vdpi, channels, bpp)
def parsejp2(data):
noBytes=len(data)
byteStart=0
boxLengthValue=1 # dummy value for while loop condition
noBytes = len(data)
byteStart = 0
boxLengthValue = 1 # dummy value for while loop condition
width, height, colorspace, hdpi, vdpi = None, None, None, None, None
while byteStart < noBytes and boxLengthValue != 0:
boxLengthValue, boxType, byteEnd, boxContents = getBox(data, byteStart, noBytes)
if boxType == 'jp2h':
width, height, colorspace = parse_jp2h(boxContents)
if boxType == b"jp2h":
width, height, colorspace, hdpi, vdpi, channels, bpp = parse_jp2h(
boxContents
)
break
byteStart = byteEnd
if not width:
raise Exception("no width in jp2 header")
@ -80,11 +112,42 @@ def parsejp2(data):
raise Exception("no height in jp2 header")
if not colorspace:
raise Exception("no colorspace in jp2 header")
return (width, height, colorspace)
# retrieving the dpi is optional so we do not error out if not present
return (width, height, colorspace, hdpi, vdpi, channels, bpp)
def parsej2k(data):
lsiz, rsiz, xsiz, ysiz, xosiz, yosiz, _, _, _, _, csiz = struct.unpack(
">HHIIIIIIIIH", data[4:42]
)
ssiz = [None] * csiz
xrsiz = [None] * csiz
yrsiz = [None] * csiz
for i in range(csiz):
ssiz[i], xrsiz[i], yrsiz[i] = struct.unpack(
"BBB", data[42 + 3 * i : 42 + 3 * (i + 1)]
)
assert ssiz == [7, 7, 7]
return xsiz - xosiz, ysiz - yosiz, None, None, None, csiz, 8
def parse(data):
if data[:4] == b"\xff\x4f\xff\x51":
return parsej2k(data)
else:
return parsejp2(data)
if __name__ == "__main__":
import sys
width, height, colorspace = parsejp2(open(sys.argv[1]).read())
sys.stdout.write("width = %d"%width)
sys.stdout.write("height = %d"%height)
sys.stdout.write("colorspace = %s"%colorspace)
width, height, colorspace, hdpi, vdpi, channels, bpp = parse(
open(sys.argv[1], "rb").read()
)
print("width = %d" % width)
print("height = %d" % height)
print("colorspace = %s" % colorspace)
print("hdpi = %s" % hdpi)
print("vdpi = %s" % vdpi)
print("channels = %s" % channels)
print("bpp = %s" % bpp)

View file

@ -1,109 +0,0 @@
import unittest
import os
import img2pdf
import zlib
from PIL import Image
HERE = os.path.dirname(__file__)
#convert +set date:create +set date:modify -define png:exclude-chunk=time
def test_suite():
class TestImg2Pdf(unittest.TestCase):
pass
for test_name in os.listdir(os.path.join(HERE, "input")):
inputf = os.path.join(HERE, "input", test_name)
if not os.path.isfile(inputf):
continue
outputf = os.path.join(HERE, "output", test_name+".pdf")
assert os.path.isfile(outputf)
def handle(self, f=inputf, out=outputf):
with open(f, "rb") as inf:
orig_imgdata = inf.read()
pdf = img2pdf.convert([f], nodate=True)
imgdata = b""
instream = False
imgobj = False
colorspace = None
imgfilter = None
width = None
height = None
length = None
# ugly workaround to parse the created pdf
for line in pdf.split(b'\n'):
if instream:
if line == b"endstream":
break
else:
imgdata += line + b'\n'
else:
if imgobj and line == b"stream":
instream = True
elif b"/Subtype /Image" in line:
imgobj = True
elif b"/Width" in line:
width = int(line.split()[-1])
elif b"/Height" in line:
height = int(line.split()[-1])
elif b"/Length" in line:
length = int(line.split()[-1])
elif b"/Filter" in line:
imgfilter = line.split()[-2]
elif b"/ColorSpace" in line:
colorspace = line.split()[-1]
# remove trailing \n
imgdata = imgdata[:-1]
# test if the length field is correct
self.assertEqual(len(imgdata), length)
# test if the filter is valid:
self.assertIn(imgfilter, [b"/DCTDecode", b"/JPXDecode", b"/FlateDecode"])
# test if the colorspace is valid
self.assertIn(colorspace, [b"/DeviceGray", b"/DeviceRGB", b"/DeviceCMYK"])
# test if the image has correct size
orig_img = Image.open(f)
self.assertEqual(width, orig_img.size[0])
self.assertEqual(height, orig_img.size[1])
# if the input file is a jpeg then it should've been copied
# verbatim into the PDF
if imgfilter in [b"/DCTDecode", b"/JPXDecode"]:
self.assertEqual(imgdata, orig_imgdata)
elif imgfilter == b"/FlateDecode":
# otherwise, the data is flate encoded and has to be equal to
# the pixel data of the input image
imgdata = zlib.decompress(imgdata)
if colorspace == b"/DeviceGray":
colorspace = 'L'
elif colorspace == b"/DeviceRGB":
colorspace = 'RGB'
elif colorspace == b"/DeviceCMYK":
colorspace = 'CMYK'
else:
raise Exception("invalid colorspace")
im = Image.frombytes(colorspace, (width, height), imgdata)
if orig_img.mode == '1':
orig_img = orig_img.convert("L")
elif orig_img.mode not in ("RGB", "L", "CMYK", "CMYK;I"):
orig_img = orig_img.convert("RGB")
self.assertEqual(im.tobytes(), orig_img.tobytes())
# the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the close() method
try:
im.close()
except AttributeError:
pass
# lastly, make sure that the generated pdf matches bit by bit the
# expected pdf
with open(out, "rb") as outf:
out = outf.read()
self.assertEqual(pdf, out)
# the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the close() method
try:
orig_img.close()
except AttributeError:
pass
setattr(TestImg2Pdf, "test_%s"%test_name, handle)
return unittest.TestSuite((
unittest.makeSuite(TestImg2Pdf),
))

Binary file not shown.

After

(image error) Size: 1.9 KiB

BIN
src/tests/input/gray.png Normal file

Binary file not shown.

After

(image error) Size: 814 B

BIN
src/tests/input/mono.jb2 Normal file

Binary file not shown.

BIN
src/tests/input/mono.png Normal file

Binary file not shown.

After

(image error) Size: 444 B

BIN
src/tests/input/mono.tif Normal file

Binary file not shown.

Binary file not shown.

Before

(image error) Size: 1.1 KiB

After

(image error) Size: 4.9 KiB

Binary file not shown.

Binary file not shown.

View file

@ -0,0 +1,138 @@
%PDF-1.3
%âãÏÓ
1 0 obj
<<
/Producer (img2pdf 0.4.3)
>>
endobj
2 0 obj
<<
/Pages 3 0 R
/Type /Catalog
>>
endobj
3 0 obj
<<
/Count 2
/Kids [ 4 0 R 7 0 R ]
/Type /Pages
>>
endobj
4 0 obj
<<
/Contents 5 0 R
/MediaBox [ 0 0 86.25 36 ]
/Parent 3 0 R
/Resources <<
/XObject <<
/Im0 6 0 R
>>
>>
/Type /Page
>>
endobj
5 0 obj
<<
/Length 48
>>
stream
q
86.2500 0 0 36.0000 0.0000 0.0000 cm
/Im0 Do
Q
endstream
endobj
6 0 obj
<<
/BitsPerComponent 8
/ColorSpace [ /Indexed /DeviceRGB 255 < 000000 010101 030303 040404 060606 070707 080808 090909 0a0a0a 0b0b0b 0c0c0c 131313 141414 161616 171717 181818 1b1b1b 202020 212121 2d2d2d 2f2f2f 303030 333333 3b3b3b 3d3d3d 3e3e3e 3f3f3f 474747 4c4c4c 505050 5e5e5e 5f5f5f 626262 6a6a6a 747474 757575 8a8a8a 8b8b8b 959595 9d9d9d a0a0a0 a1a1a1 afafaf b3b3b3 b8b8b8 c0c0c0 c1c1c1 c2c2c2 c4c4c4 cccccc cfcfcf d0d0d0 d2d2d2 dedede dfdfdf e4e4e4 e7e7e7 e8e8e8 e9e9e9 ebebeb ececec f3f3f3 f4f4f4 f5f5f5 f6f6f6 f7f7f7 f8f8f8 f9f9f9 fbfbfb fcfcfc fefefe ffffff 050505 0e0e0e 191919 1e1e1e 222222 252525 272727 2b2b2b 2e2e2e 414141 424242 454545 525252 555555 575757 595959 5b5b5b 5c5c5c 666666 707070 717171 727272 7d7d7d 818181 939393 979797 9e9e9e a2a2a2 a5a5a5 a7a7a7 ababab b2b2b2 b4b4b4 bfbfbf c5c5c5 cdcdcd d3d3d3 d5d5d5 d6d6d6 dcdcdc e0e0e0 e2e2e2 eaeaea efefef f0f0f0 f2f2f2 0d0d0d 0f0f0f 101010 151515 1d1d1d 1f1f1f 232323 292929 2a2a2a 2c2c2c 323232 3a3a3a 404040 4b4b4b 4d4d4d 545454 585858 5a5a5a 5d5d5d 616161 686868 6c6c6c 7e7e7e 828282 8d8d8d 8e8e8e 8f8f8f 999999 a3a3a3 a4a4a4 a6a6a6 a8a8a8 aaaaaa adadad bababa bdbdbd bebebe d1d1d1 d4d4d4 d8d8d8 dadada dddddd e1e1e1 e6e6e6 f1f1f1 fafafa 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 > ]
/DecodeParms <<
/BitsPerComponent 8
/Colors 1
/Columns 115
/Predictor 15
>>
/Filter /FlateDecode
/Height 48
/Length 392
/Subtype /Image
/Type /XObject
/Width 115
>>
stream
xœíÖ×VÂ@àQ)
Šˆ
ŠTlذ¬}'9œމD“ÌUæŠ<C3A6>/$„âÑMÝÔMÝTÕ {3©:cµ­ñÈ3—oÉJùýóÝU¶ýt²üÿ)5s6¹;T3ŸðÛrì_Œ0<C592>󦿘Â{-8„t«P1D­!~=Þi<C39E>³Yy IrYØyn×Ú܇äF<ÒÌtB2ƒlš YE6k ™E6<1D>\#<E280BA>8~?¨êæ­¬hcŠ?†è·2»=<3D>vñ-Ç<>w
ŤK=•ÅnE_MSä…zë«\L:ŒaR^d\kÔ9‡br˦8tÏdëž»Ëè&eì0Ú…kÒKÝB6µ Q5#!ÝD6¯`Ô­­ÙöXºÎoÀ¨O[“<>æÁ…âòu&“Q­MBLîþ靨gзg`—¹ò›ÿ÷^!RIí¨Œ Vd¾dJº‰“¯nÕM¨<4D>»ÞޤÕh´:ÓgC")rLŒÒMÝTZßÈñ p
endstream
endobj
7 0 obj
<<
/Contents 8 0 R
/MediaBox [ 0 0 86.25 36 ]
/Parent 3 0 R
/Resources <<
/XObject <<
/Im0 9 0 R
>>
>>
/Type /Page
>>
endobj
8 0 obj
<<
/Length 48
>>
stream
q
86.2500 0 0 36.0000 0.0000 0.0000 cm
/Im0 Do
Q
endstream
endobj
9 0 obj
<<
/BitsPerComponent 8
/ColorSpace [ /Indexed /DeviceRGB 255 < 000000 010101 030303 040404 060606 070707 080808 090909 0a0a0a 0b0b0b 0c0c0c 131313 141414 161616 171717 181818 1b1b1b 202020 212121 2d2d2d 2f2f2f 303030 333333 3b3b3b 3d3d3d 3e3e3e 3f3f3f 474747 4c4c4c 505050 5e5e5e 5f5f5f 626262 6a6a6a 747474 757575 8a8a8a 8b8b8b 959595 9d9d9d a0a0a0 a1a1a1 afafaf b3b3b3 b8b8b8 c0c0c0 c1c1c1 c2c2c2 c4c4c4 cccccc cfcfcf d0d0d0 d2d2d2 dedede dfdfdf e4e4e4 e7e7e7 e8e8e8 e9e9e9 ebebeb ececec f3f3f3 f4f4f4 f5f5f5 f6f6f6 f7f7f7 f8f8f8 f9f9f9 fbfbfb fcfcfc fefefe ffffff 050505 0e0e0e 191919 1e1e1e 222222 252525 272727 2b2b2b 2e2e2e 414141 424242 454545 525252 555555 575757 595959 5b5b5b 5c5c5c 666666 707070 717171 727272 7d7d7d 818181 939393 979797 9e9e9e a2a2a2 a5a5a5 a7a7a7 ababab b2b2b2 b4b4b4 bfbfbf c5c5c5 cdcdcd d3d3d3 d5d5d5 d6d6d6 dcdcdc e0e0e0 e2e2e2 eaeaea efefef f0f0f0 f2f2f2 0d0d0d 0f0f0f 101010 151515 1d1d1d 1f1f1f 232323 292929 2a2a2a 2c2c2c 323232 3a3a3a 404040 4b4b4b 4d4d4d 545454 585858 5a5a5a 5d5d5d 616161 686868 6c6c6c 7e7e7e 828282 8d8d8d 8e8e8e 8f8f8f 999999 a3a3a3 a4a4a4 a6a6a6 a8a8a8 aaaaaa adadad bababa bdbdbd bebebe d1d1d1 d4d4d4 d8d8d8 dadada dddddd e1e1e1 e6e6e6 f1f1f1 fafafa 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 > ]
/DecodeParms <<
/BitsPerComponent 8
/Colors 1
/Columns 115
/Predictor 15
>>
/Filter /FlateDecode
/Height 48
/Length 387
/Subtype /Image
/Type /XObject
/Width 115
>>
stream
xœíçRÂP…/VT+"
JQ#æ<>cÅ^°aC ¾¨› š1<1D>Éù•Ýsö~7w2IQ¥J•ªÿ.ƒ-˜§¼3wöV<C3B6>ô1ZPB.?Z7´œÕö\·ˆ¯W.3â•:!s/×þ[Ëf<C38B>ž1_NÃÌ;fñVZZ'XF¾n I.y N²*»³;¦4󜓢×+<2B>é§§™¹N2Ó Î)2ó÷ï7*;ó€µ>{+•a
¼N^¸*à°ÃhLƒ“o¹lÍ(LR=¸é®NZäd
lõÏem&nÀ`¢1]Äœ_i¨«…ÉIßMpÐg<&«Cî”»P™„ñ@t—IÎ!ú†Ìd êGfê @f!úŽÌ<ƒ¨CYæø~nþ€¨UY&MOõÔfÊíWH&uJ3i:èèëlZÓ®¬O!¶ŒHeæKÈYè<59>¼á™;ÑÙœnê¦àÓ-;´q40šôQ”Ï¿êþá'¥ˆ‡^•ª?­oíœ
endstream
endobj
xref
0 10
0000000000 65535 f
0000000015 00000 n
0000000066 00000 n
0000000123 00000 n
0000000200 00000 n
0000000385 00000 n
0000000487 00000 n
0000003008 00000 n
0000003193 00000 n
0000003295 00000 n
trailer
<<
/Info 1 0 R
/Root 2 0 R
/Size 10
>>
startxref
5811
%%EOF

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -16,17 +16,17 @@ for a in `convert -list compress`; do
echo "encode:\t$a"
convert "$1" -compress $a "`basename $1 .jpg`.pdf"
pdfimages "`basename $1 .jpg`.pdf" "`basename $1 .jpg`"
/bin/echo -ne "diff:\t"
printf "diff:\t"
diff=`compare -metric AE "$1" "\`basename $1 .jpg\`-000.ppm" null: 2>&1`
if [ "$diff" != "0" ]; then
echo "lossy"
else
echo "lossless"
fi
/bin/echo -ne "size:\t"
printf "size:\t"
pdfsize=`stat -c "%s" "\`basename $1 .jpg\`.pdf"`
echo "scale=1;$pdfsize/$imsize" | bc
/bin/echo -ne "pdf:\t"
printf "pdf:\t"
grep --max-count=1 --text /Filter "`basename $1 .jpg`.pdf"
echo
done

18
tox.ini Normal file
View file

@ -0,0 +1,18 @@
# tox (https://tox.readthedocs.io/) is a tool for running tests
# in multiple virtualenvs. This configuration file will run the
# test suite on all supported python versions. To use it, "pip install tox"
# and then run "tox" from this directory.
[tox]
envlist = py37, py38, py39, py310
skip_missing_interpreters = true
[testenv]
deps =
pdfrw
pytest
pikepdf
numpy
scipy
commands =
python -m pytest -vv