release version 0.6.1

src/img2pdf_test.py: do not unlink temporary files -- let pytest take care of that for us
This way, we can export all the artifacts for later retrieval when pytest runs on CI systems, making debugging of issues far easier as it avoids having to recreate the artifacts locally first.
2025-04-27 18:54:38 +02:00 · 2025-04-27 18:50:11 +02:00 · 2025-04-27 18:48:50 +02:00 · 2025-03-26 03:54:38 +01:00 · 2025-03-23 15:37:32 +01:00 · 2025-03-23 15:37:32 +01:00
33 changed files with 12770 additions and 851 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,3 +1,6 @@
 *.pyc
 build
 src/*.egg-info
+
+.eggs
+.tox
--- a/.mailmap
+++ b/.mailmap
@ -0,0 +1,3 @@
+Johannes Schauer Marin Rodrigues <josch@mister-muffin.de>
+Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> <j.schauer@email.de>
+Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> <josch@pyneo.org>
--- a/.travis.yml
+++ b/.travis.yml
@ -0,0 +1,42 @@
+language: python
+matrix:
+  include:
+    - name: "Ubuntu Focal"
+      dist: focal
+      addons:
+        apt:
+          packages:
+            - imagemagick
+            - libtiff-tools
+            - libimage-exiftool-perl
+            - poppler-utils
+            - netpbm
+            - ghostscript
+            - mupdf-tools
+    - name: "python 3.9 Windows"
+      os: windows
+      language: shell       # 'language: python' is an error on Travis CI Windows
+      before_install: choco install python imagemagick
+      env: PATH=/c/Python39:/c/Python39/Scripts:$PATH
+    - name: "python 3.7 MacOs"
+      os: osx
+      osx_image: xcode12.2  # pikepdf import fails with earlier versions
+      language: shell       # 'language: python' is an error on Travis CI macOS
+      cache:
+        directories:
+          - "$HOME/Library/Caches/Homebrew"
+          - "$HOME/Library/Caches/pip"
+      addons:
+        homebrew:
+          #update: true
+          packages:
+            - python3
+            - imagemagick
+      before_install:
+        - python3 -m pip install --upgrade virtualenv
+        - virtualenv -p python3 --system-site-packages "$HOME/venv"
+        - source "$HOME/venv/bin/activate"
+install: pip install tox
+script:
+  - python --version
+  - python -m tox
--- a/CHANGES.rst
+++ b/CHANGES.rst
@ -2,49 +2,223 @@
 CHANGES
 =======

-0.1.6
-----
+0.6.1 (2025-04-27)
+------------------

- - replace -x and -y option by combined option -s (or --pagesize) and use -S
-   for --subject
+ - testsuite fixes
+
+0.6.0 (2025-02-15)
+------------------
+
+ - Add support for JBIG2 (generic coding)
+ - Add convert_to_docobject() broken out from convert()
+ - Add pil_get_dpi() broken out from get_imgmetadata()
+
+0.5.1 (2023-11-26)
+------------------
+
+ - no default ICC profile location for PDF/A-1b on Windows
+ - workaround for PNG input without dpi units but non-square dpi aspect ratio
+
+0.5.0 (2023-10-28)
+------------------
+
+ - support MIFF for 16 bit CMYK input
+ - accept pathlib.Path objects as input
+ - don't store RGB ICC profiles from bilevel or grayscale TIFF, PNG and JPEG
+ - thumbnails are no longer included by default and --include-thumbnails has to
+   be used if you want them
+ - support for pikepdf (>= 6.2.0)
+
+0.4.4 (2022-04-07)
+------------------
+
+ - --viewer-page-layout support for twopageright and twopageleft
+ - Add B and JB paper sizes
+ - support for pikepdf (>= 5.0.0) and Pillow (>= 9.1.0)
+
+0.4.3 (2021-10-24)
+------------------
+
+ - fix --viewer-initial-page (broken in last release)
+
+0.4.2 (2021-10-11)
+------------------
+
+ - add --rotation
+ - allow palette PNG images with ICC profile
+ - sort globbing result on windows
+ - convert 8-bit PNG alpha channels to /SMasks in PDF
+ - remove pdfrw from tests
+
+0.4.1 (2021-05-09)
+------------------
+
+ - support wildcards in paths on windows
+ - support MPO images
+ - fix page border computation
+ - use "img2pdf" logger instead of "root" logger
+ - add --from-file
+
+0.4.0 (2020-08-07)
+------------------
+
+ - replace --without-pdfrw by --engine=internal or --engine=pdfrw
+ - add pikepdf as additional rendering engine and add --engine=pikepdf
+ - support for creating PDF/A-1b compliant PDF using the --pdfa option
+   (this also requires the presence of an ICC profile somewhere on the system)
+ - support for images with embedded ICC profile as input
+ - rewrite tests
+    * use pytest via tox
+    * use pikepdf instead of pdfrw
+    * use imagemagick json output instead of identify -verbose
+ - format all code with black
+
+0.3.6 (2020-04-30)
+------------------
+
+ - fix tests for Fedora on arm64
+
+0.3.5 (2020-04-28)
+------------------
+
+ - remove all Python 2 support
+ - disable pdfrw by default
+
+0.3.4 (2020-04-05)
+------------------
+
+ - test.sh: replace imagemagick with custom python script to produce bit-by-bit
+   identical results on all architectures
+ - add --crop-border, --bleed-border, --trim-border and --art-border options
+ - first draft of a rudimentary tkinter gui (run with --gui)
+
+0.3.3 (2019-01-07)
+------------------
+
+ - restore basic support for Python 2
+ - also ship test.sh
+ - add legal and tabloid paper formats
+ - respect exif rotation tag
+
+0.3.2 (2018-11-20)
+------------------
+
+ - support big endian TIFF with lsb-to-msb FillOrder
+ - support multipage CCITT Group 4 TIFF
+ - also reject palette images with transparency
+ - support PNG images with 1, 2, 4 or 16 bits per sample
+ - support multipage TIFF with differently encoded images
+ - support CCITT Group4 TIFF without rows-per-strip
+ - add extensive test suite
+
+0.3.1 (2018-08-04)
+------------------
+
+ - Directly copy data from CCITT Group 4 encoded TIFF images into the PDF
+   container without re-encoding
+
+0.3.0 (2018-06-18)
+------------------
+
+ - Store non-jpeg images using PNG compression
+ - Support arbitrarily large pages via PDF /UserUnit field
+ - Disallow input with alpha channel as it cannot be preserved
+ - Add option --pillow-limit-break to support very large input
+
+0.2.4 (2017-05-23)
+------------------
+
+ - Restore support for Python 2.7
+ - Add support for PyPy
+ - Add support for testing using tox
+
+0.2.3 (2017-01-20)
+------------------
+
+ - version number bump for botched pypi upload...
+
+0.2.2 (2017-01-20)
+------------------
+
+ - automatic monochrome CCITT Group4 encoding via Pillow/libtiff
+
+0.2.1 (2016-05-04)
+------------------
+
+ - set img2pdf as /producer value
+ - support multi-frame images like multipage TIFF and animated GIF
+ - support for palette images like GIF
+ - support all colorspaces and imageformats known by PIL
+ - read horizontal and vertical dpi from JPEG2000 files
+
+0.2.0 (2015-05-10)
+------------------
+
+ - now Python3 only
+ - pep8 compliant code
+ - update my email to josch@mister-muffin.de
+ - move from github to gitlab.mister-muffin.de/josch/img2pdf
+ - use logging module
+ - add extensive test suite
+ - ability to read from standard input
+ - pdf writer:
+      - make more compatible with the interface of pdfrw module
+      - print floats which equal to their integer conversion as integer
+      - do not print trailing zeroes for floating point numbers
+      - print more linebreaks
+      - add binary string at beginning of PDF to indicate that the PDF
+        contains binary data
+      - handle datetime and unicode strings by using utf-16-be encoding
+ - new options (see --help for more details):
+      - --without-pdfrw
+      - --imgsize
+      - --border
+      - --fit
+      - --auto-orient
+      - --viewer-panes
+      - --viewer-initial-page
+      - --viewer-magnification
+      - --viewer-page-layout
+      - --viewer-fit-window
+      - --viewer-center-window
+      - --viewer-fullscreen
+ - remove short options for metadata command line arguments
 - correctly encode and escape non-ascii metadata
 - explicitly store date in UTC and allow parsing all date formats understood
   by dateutil and `date --date`

-0.1.5
-----
+0.1.5 (2015-02-16)
+------------------

 - Enable support for CMYK images
 - Rework test suite
 - support file objects as input

-0.1.4
-----
+0.1.4 (2015-01-21)
+------------------

 - add Python 3 support
 - make output reproducible by sorting and --nodate option

-0.1.3
-----
+0.1.3 (2014-11-10)
+------------------

 - Avoid leaking file descriptors
 - Convert unrecognized colorspaces to RGB

-0.1.1
-----
+0.1.1 (2014-09-07)
+------------------

 - allow running src/img2pdf.py standalone
 - license change from GPL to LGPL
 - Add pillow 2.4.0 support
 - add options to specify pdf dimensions in points

-0.1.0 (unreleased)
+0.1.0 (2014-03-14, unreleased)
 ------------------

 - Initial PyPI release.
-
 - Modified code to create proper package.
-
 - Added tests.
-
 - Added console script entry point.
--- a/83
+++ b/83
@ -0,0 +1,83 @@
+Running img2pdf from source
+---------------------------
+
+img2pdf can be run directly from the cloned git repository:
+
+    $ python3 src/img2pdf.py img.jpg -o out.pdf
+
+Running the testsuite
+---------------------
+
+    $ pytest
+
+Making a new release
+--------------------
+
+ - CHANGES.rst: Add a new entry
+ - setup.py: Bump VERSION
+ - src/img2pdf.py: Bump __version__
+ - Commit:
+
+    $ git add CHANGES.rst setup.py src/img2pdf.py
+    $ git commit -m "release version X.Y.Z"
+
+ - Add git tag:
+
+    $ git tag X.Y.Z -m X.Y.Z
+
+ - Build and upload to pypi:
+
+    $ rm -rf dist/*
+    $ python3 setup.py sdist
+    $ twine upload dist/*
+
+ - Push everything to git forge
+
+    $ git push
+
+ - Push to github
+
+    $ git push github
+
+ - Obtain img2pdf.exe from appveyor:
+
+    https://ci.appveyor.com/project/josch/img2pdf/
+
+ - Create new release:
+
+    https://gitlab.mister-muffin.de/josch/img2pdf/releases/new
+
+Using debbisect to find regressions
+-----------------------------------
+
+    $ debbisect --cache=./cache  --depends="git,ca-certificates,python3,
+           ghostscript,imagemagick,mupdf-tools,poppler-utils,python3-pil,
+           python3-pytest,python3-numpy,python3-scipy,python3-pikepdf" \
+      --verbose  2023-09-16 2023-10-24 \
+      'chroot "$1" sh -c "
+           git clone https://gitlab.mister-muffin.de/josch/img2pdf.git
+           && cd img2pdf
+           && pytest 'src/img2pdf_test.py::test_jpg_2000_rgba8[internal]"'
+
+Using debbisect cache
+---------------------
+
+    $ mmdebstrap --variant=apt --aptopt='Acquire::Check-Valid-Until "false"' \
+          --include=git,ca-certificates,python3,ghostscript,imagemagick \
+          --include=mupdf-tools,poppler-utils,python3-pil,python3-pytest \
+          --include=python3-numpy,python3-scipy,python3-pikepdf \
+          --hook-dir=/usr/share/mmdebstrap/hooks/file-mirror-automount \
+          --setup-hook='mkdir -p "$1/home/josch/git/devscripts/cache/pool/"' \
+          --setup-hook='mount -o ro,bind /home/josch/git/devscripts/cache/pool/ "$1/home/josch/git/devscripts/cache/pool/"' \
+          --chrooted-customize-hook=bash
+          unstable /dev/null
+          file:///home/josch/git/devscripts/cache/archive/debian/20231022T090139Z/
+
+Bisecting imagemagick
+---------------------
+
+    $ git clean -fdx && git reset --hard
+    $ ./configure --prefix=$(pwd)/prefix
+    $ make -j$(nproc)
+    $ make install
+    $ LD_LIBRARY_PATH=$(pwd)/prefix/lib prefix/bin/compare ...
--- a/165
+++ b/165
@ -0,0 +1,165 @@
+                   GNU LESSER GENERAL PUBLIC LICENSE
+                       Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+
+  This version of the GNU Lesser General Public License incorporates
+the terms and conditions of version 3 of the GNU General Public
+License, supplemented by the additional permissions listed below.
+
+  0. Additional Definitions.
+
+  As used herein, "this License" refers to version 3 of the GNU Lesser
+General Public License, and the "GNU GPL" refers to version 3 of the GNU
+General Public License.
+
+  "The Library" refers to a covered work governed by this License,
+other than an Application or a Combined Work as defined below.
+
+  An "Application" is any work that makes use of an interface provided
+by the Library, but which is not otherwise based on the Library.
+Defining a subclass of a class defined by the Library is deemed a mode
+of using an interface provided by the Library.
+
+  A "Combined Work" is a work produced by combining or linking an
+Application with the Library.  The particular version of the Library
+with which the Combined Work was made is also called the "Linked
+Version".
+
+  The "Minimal Corresponding Source" for a Combined Work means the
+Corresponding Source for the Combined Work, excluding any source code
+for portions of the Combined Work that, considered in isolation, are
+based on the Application, and not on the Linked Version.
+
+  The "Corresponding Application Code" for a Combined Work means the
+object code and/or source code for the Application, including any data
+and utility programs needed for reproducing the Combined Work from the
+Application, but excluding the System Libraries of the Combined Work.
+
+  1. Exception to Section 3 of the GNU GPL.
+
+  You may convey a covered work under sections 3 and 4 of this License
+without being bound by section 3 of the GNU GPL.
+
+  2. Conveying Modified Versions.
+
+  If you modify a copy of the Library, and, in your modifications, a
+facility refers to a function or data to be supplied by an Application
+that uses the facility (other than as an argument passed when the
+facility is invoked), then you may convey a copy of the modified
+version:
+
+   a) under this License, provided that you make a good faith effort to
+   ensure that, in the event an Application does not supply the
+   function or data, the facility still operates, and performs
+   whatever part of its purpose remains meaningful, or
+
+   b) under the GNU GPL, with none of the additional permissions of
+   this License applicable to that copy.
+
+  3. Object Code Incorporating Material from Library Header Files.
+
+  The object code form of an Application may incorporate material from
+a header file that is part of the Library.  You may convey such object
+code under terms of your choice, provided that, if the incorporated
+material is not limited to numerical parameters, data structure
+layouts and accessors, or small macros, inline functions and templates
+(ten or fewer lines in length), you do both of the following:
+
+   a) Give prominent notice with each copy of the object code that the
+   Library is used in it and that the Library and its use are
+   covered by this License.
+
+   b) Accompany the object code with a copy of the GNU GPL and this license
+   document.
+
+  4. Combined Works.
+
+  You may convey a Combined Work under terms of your choice that,
+taken together, effectively do not restrict modification of the
+portions of the Library contained in the Combined Work and reverse
+engineering for debugging such modifications, if you also do each of
+the following:
+
+   a) Give prominent notice with each copy of the Combined Work that
+   the Library is used in it and that the Library and its use are
+   covered by this License.
+
+   b) Accompany the Combined Work with a copy of the GNU GPL and this license
+   document.
+
+   c) For a Combined Work that displays copyright notices during
+   execution, include the copyright notice for the Library among
+   these notices, as well as a reference directing the user to the
+   copies of the GNU GPL and this license document.
+
+   d) Do one of the following:
+
+       0) Convey the Minimal Corresponding Source under the terms of this
+       License, and the Corresponding Application Code in a form
+       suitable for, and under terms that permit, the user to
+       recombine or relink the Application with a modified version of
+       the Linked Version to produce a modified Combined Work, in the
+       manner specified by section 6 of the GNU GPL for conveying
+       Corresponding Source.
+
+       1) Use a suitable shared library mechanism for linking with the
+       Library.  A suitable mechanism is one that (a) uses at run time
+       a copy of the Library already present on the user's computer
+       system, and (b) will operate properly with a modified version
+       of the Library that is interface-compatible with the Linked
+       Version.
+
+   e) Provide Installation Information, but only if you would otherwise
+   be required to provide such information under section 6 of the
+   GNU GPL, and only to the extent that such information is
+   necessary to install and execute a modified version of the
+   Combined Work produced by recombining or relinking the
+   Application with a modified version of the Linked Version. (If
+   you use option 4d0, the Installation Information must accompany
+   the Minimal Corresponding Source and Corresponding Application
+   Code. If you use option 4d1, you must provide the Installation
+   Information in the manner specified by section 6 of the GNU GPL
+   for conveying Corresponding Source.)
+
+  5. Combined Libraries.
+
+  You may place library facilities that are a work based on the
+Library side by side in a single library together with other library
+facilities that are not Applications and are not covered by this
+License, and convey such a combined library under terms of your
+choice, if you do both of the following:
+
+   a) Accompany the combined library with a copy of the same work based
+   on the Library, uncombined with any other library facilities,
+   conveyed under the terms of this License.
+
+   b) Give prominent notice with the combined library that part of it
+   is a work based on the Library, and explaining where to find the
+   accompanying uncombined form of the same work.
+
+  6. Revised Versions of the GNU Lesser General Public License.
+
+  The Free Software Foundation may publish revised and/or new versions
+of the GNU Lesser General Public License from time to time. Such new
+versions will be similar in spirit to the present version, but may
+differ in detail to address new problems or concerns.
+
+  Each version is given a distinguishing version number. If the
+Library as you received it specifies that a certain numbered version
+of the GNU Lesser General Public License "or any later version"
+applies to it, you have the option of following the terms and
+conditions either of that published version or of any later version
+published by the Free Software Foundation. If the Library as you
+received it does not specify a version number of the GNU Lesser
+General Public License, you may choose any version of the GNU Lesser
+General Public License ever published by the Free Software Foundation.
+
+  If the Library as you received it specifies that a proxy can decide
+whether future versions of the GNU Lesser General Public License shall
+apply, that proxy's public statement of acceptance of any version is
+permanent authorization for you to choose that version for the
+Library.
--- a/MANIFEST.in
+++ b/MANIFEST.in
@ -1,6 +1,12 @@
 include README.md
 include test_comp.sh
+include test.sh
+include magick.py
+include CHANGES.rst
+include LICENSE
 recursive-include src *.jpg
 recursive-include src *.pdf
 recursive-include src *.png
+recursive-include src *.tif
+recursive-include src *.gif
 recursive-include src *.py
--- a/README.md
+++ b/README.md
@ -1,191 +1,329 @@
+[![Travis Status](https://travis-ci.com/josch/img2pdf.svg?branch=main)](https://app.travis-ci.com/josch/img2pdf)
+[![Appveyor Status](https://ci.appveyor.com/api/projects/status/2kws3wkqvi526llj/branch/main?svg=true)](https://ci.appveyor.com/project/josch/img2pdf/branch/main)
+
 img2pdf
 =======

-Losslessly convert images to PDF without unnecessarily re-encoding JPEG and
-JPEG2000 files.  Image quality is retained without unnecessarily increasing
-file size.
+Lossless conversion of raster images to PDF. You should use img2pdf if your
+priorities are (in this order):

-Background
----------
+ 1. **always lossless**: the image embedded in the PDF will always have the
+    exact same color information for every pixel as the input
+ 2. **small**: if possible, the difference in filesize between the input image
+    and the output PDF will only be the overhead of the PDF container itself
+ 3. **fast**: if possible, the input image is just pasted into the PDF document
+    as-is without any CPU hungry re-encoding of the pixel data

-Quality loss can be avoided when converting JPEG and JPEG2000 images to
-PDF by embedding them without re-encoding.  I wrote this piece of python code.
-because I was missing a tool to do this automatically.
+Conventional conversion software (like ImageMagick) would either:

-If you know how to embed JPEG and JPEG2000 images into a PDF container without
-recompression, using existing tools, please contact me so that I can put this
-code into the garbage bin :D
+ 1. not be lossless because lossy re-encoding to JPEG
+ 2. not be small because using wasteful flate encoding of raw pixel data
+ 3. not be fast because input data gets re-encoded

-Functionality
-------------
+Another advantage of not having to re-encode the input (in most common
+situations) is, that img2pdf is able to handle much larger input than other
+software, because the raw pixel data never has to be loaded into memory.

-This program will take a list of images and produce a PDF file with the images
-embedded in it.  JPEG and JPEG2000 images will be included without
-recompression.  Images in other formats will be included with zip/flate
-encoding which usually leads to an increase in the resulting size because
-formats like png compress better than PDF which just zip/flate compresses the
-RGB data.  As a result, this tool is able to losslessly wrap images into a PDF
-container with a quality-filesize ratio that is typically better (in case of
-JPEG and JPEG2000 images) or equal (in case of other formats) than that of
-existing tools.
+The following table shows how img2pdf handles different input depending on the
+input file format and image color space.

-For example, imagemagick will re-encode the input JPEG image (thus changing
-its content):
+| Format                                | Colorspace                           | Result        |
+| ------------------------------------- | ------------------------------------ | ------------- |
+| JPEG                                  | any                                  | direct        |
+| JPEG2000                              | any                                  | direct        |
+| PNG (non-interlaced, no transparency) | any                                  | direct        |
+| TIFF (CCITT Group 4)                  | 1-bit monochrome                     | direct        |
+| JBIG2 (single-page generic coding)    | 1-bit monochrome                     | direct        |
+| any                                   | any except CMYK and 1-bit monochrome | PNG Paeth     |
+| any                                   | 1-bit monochrome                     | CCITT Group 4 |
+| any                                   | CMYK                                 | flate         |

-	$ convert img.jpg img.pdf
-	$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
-	$ compare -metric AE img.jpg img.extr-000.ppm null:
-	1.6301e+06
-
-If one wants to losslessly convert from any format to PDF with
-imagemagick, one has to use zip compression:
-
-	$ convert input.jpg -compress Zip output.pdf
-	$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
-	$ compare -metric AE img.jpg img.extr-000.ppm null:
-	0
-
-However, this approach will result in PDF files that are a few times larger
-than the input JPEG or JPEG2000 file.
-
-img2pdf is able to losslessly embed JPEG and JPEG2000 files into a PDF
-container without additional overhead (aside from the PDF structure itself),
-save other graphics formats using lossless zip compression,
-and produce multi-page PDF files when more than one input image is given.
-
-Also, since JPEG and JPEG2000 images are not reencoded, conversion  with
-img2pdf is several times faster than with other tools.
+For JPEG, JPEG2000, non-interlaced PNG, TIFF images with CCITT Group 4
+encoded data, and JBIG2 with single-page generic coding (e.g. using `jbig2enc`),
+img2pdf directly embeds the image data into the PDF without
+re-encoding it. It thus treats the PDF format merely as a container format for
+the image data. In these cases, img2pdf only increases the filesize by the size
+of the PDF container (typically around 500 to 700 bytes). Since data is only
+copied and not re-encoded, img2pdf is also typically faster than other
+solutions for these input formats.

+For all other input types, img2pdf first has to transform the pixel data to
+make it compatible with PDF. In most cases, the PNG Paeth filter is applied to
+the pixel data. For 1-bit monochrome input, CCITT Group 4 is used instead. Only for
+CMYK input no filter is applied before finally applying flate compression.

 Usage
 -----

-#### General Notes
+The images must be provided as files because img2pdf needs to seek in the file
+descriptor.

-The images must be provided as files because img2pdf needs to seek
-in the file descriptor.  Input cannot be piped through stdin.
+If no output file is specified with the `-o`/`--output` option, output will be
+done to stdout. A typical invocation is:

-If no output file is specified with the `-o`/`--output` option,
-output will be to stdout.
+	$ img2pdf img1.png img2.jpg -o out.pdf

-Descriptions of the options should be self explanatory.
-They are available by running:
+The detailed documentation can be accessed by running:

-	img2pdf --help
+	$ img2pdf --help

+With no command line arguments supplied, img2pdf will read a single image from
+standard input and write the resulting PDF to standard output. Here is an
+example for how to scan directly to PDF using scanimage(1) from SANE:

-#### Controlling Page Size
-
-The PDF page size can be manipulated.  By default, the image will be sized "into" the given dimensions with the aspect ratio retained.  For instance, to size an image into a page that is at most 500pt x 500pt, use:
-
-	img2pdf -s 500x500 -o output.pdf input.jpg
-
-To "fill" out a page that is at least 500pt x 500pt, follow the dimensions with a `^`:
-
-	img2pdf -s 500x500^ -o output.pdf input.jpg
-
-To output pages that are exactly 500pt x 500pt, follow the dimensions with an `!`:
-
-	img2pdf -s 500x500\! -o output.pdf input.jpg
-
-Notice that the default unit is points.  Units may be also be specified and mixed:
-
-	img2pdf -s 8.5inx27.94cm -o output.pdf input.jpg
-
-If either width or height is omitted, the other will be calculated
-to preserve aspect ratio.
-
-	img2pdf -s x280mm -o output1.pdf input.jpg
-	img2pdf -s 280mmx -o output2.pdf input.jpg
-
-Some standard page sizes are recognized:
-
-	img2pdf -s letter -o output1.pdf input.jpg
-	img2pdf -s a4 -o output2.pdf input.jpg
-
-#### Colorspace
-
-Currently, the colorspace must be forced for JPEG 2000 images that are
-not in the RGB colorspace.  Available colorspace options are based on
-Python Imaging Library (PIL) short handles.
-
- * `RGB` = RGB color
- * `L` = Grayscale
- * `1` = Black and white (internally converted to grayscale)
- * `CMYK` = CMYK color
- * `CMYK;I` = CMYK color with inversion
-
-For example, to encode a grayscale JPEG2000 image, use:
-
-	img2pdf -C L -o output.pdf input.jp2
+	$ scanimage --mode=Color --resolution=300 | pnmtojpeg -quality 90 | img2pdf > scan.pdf

 Bugs
 ----

-If you find a JPEG or JPEG2000 file that, when embedded cannot be read
-by the Adobe Acrobat Reader, please contact me.
+ - If you find a JPEG, JPEG2000, PNG or CCITT Group 4 encoded TIFF file that,
+   when embedded into the PDF cannot be read by the Adobe Acrobat Reader,
+   please contact me.

-For lossless conversion of formats other than JPEG or JPEG2000, zip/flate
-encoding is used.  This choice is based on tests I did with a number of images.
-I converted them into PDF using the lossless variants of the compression
-formats offered by imagemagick.  In all my tests, zip/flate encoding performed
-best.  You can verify my findings using the test_comp.sh script with any input
-image given as a commandline argument.  If you find an input file that is
-outperformed by another lossless compression method, contact me.
+ - An error is produced if the input image is broken. This commonly happens if
+   the input image has an invalid EXIF Orientation value of zero. Even though
+   only nine different values from 1 to 9 are permitted, Anroid phones and
+   Canon DSLR cameras produce JPEG images with the invalid value of zero.
+   Either fix your input images with `exiftool` or similar software before
+   passing the JPEG to `img2pdf` or run `img2pdf` with `--rotation=ifvalid`
+   (if you run img2pdf from the commandline) or by passing
+   `rotation=img2pdf.Rotation.ifvalid` as an argument to `convert()` when using
+   img2pdf as a library.

-I have not yet figured out how to determine the colorspace of JPEG2000 files.
-Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 files with
-other colorspaces, you must force it using the `--colorspace` option.
-
-It might be possible to store transparency using masks but it is not clear
-what the utility of such a functionality would be.
-
-Most vector graphic formats can be losslessly turned into PDF (minus some of
-the features unsupported by PDF) but img2pdf will currently turn vector
-graphics into their lossy raster representations.
-
-Acrobat is able to store a hint for the PDF reader of how to present the PDF
-when opening it. Things like automatic fullscreen or the zoom level can be
-configured.
-
-It would be nice if a single input image could be read from standard input.
+ - img2pdf uses PIL (or Pillow) to obtain image meta data and to convert the
+   input if necessary. To prevent decompression bomb denial of service attacks,
+   Pillow limits the maximum number of pixels an input image is allowed to
+   have. If you are sure that you know what you are doing, then you can disable
+   this safeguard by passing the `--pillow-limit-break` option to img2pdf. This
+   allows one to process even very large input images.

 Installation
 ------------

-On a Debian- and Ubuntu-based systems, dependencies may be installed
-with the following command:
+On a Debian- and Ubuntu-based systems, img2pdf can be installed from the
+official repositories:

-	apt-get install python python-pil python-setuptools
+	$ apt install img2pdf

-Or for Python 3:
+If you want to install it using pip, you can run:

-	apt-get install python3 python3-pil python3-setuptools
-
-You can then install the package using:
-
-	$ pip install img2pdf
+	$ pip3 install img2pdf

 If you prefer to install from source code use:

 	$ cd img2pdf/
-	$ pip install .
+	$ pip3 install .

 To test the console script without installing the package on your system,
 use virtualenv:

 	$ cd img2pdf/
 	$ virtualenv ve
-	$ ve/bin/pip install .
+	$ ve/bin/pip3 install .

 You can then test the converter using:

 	$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg

+If you don't want to setup Python on Windows, then head to the
+[releases](https://gitlab.mister-muffin.de/josch/img2pdf/releases) section and download the latest
+`img2pdf.exe`.
+
+GUI
+---
+
+There exists an experimental GUI with all settings currently disabled. You can
+directly convert images to PDF but you cannot set any options via the GUI yet.
+If you are interested in adding more features to the PDF, please submit a merge
+request. The GUI is based on tkinter and works on Linux, Windows and MacOS.
+
+![](screenshot.png)
+
+Library
+-------
+
 The package can also be used as a library:

 	import img2pdf
-	pdf_bytes = img2pdf.convert(['test.jpg'])

-	file = open("name.pdf","wb")
-	file.write(pdf_bytes)
+	# opening from filename
+	with open("name.pdf","wb") as f:
+		f.write(img2pdf.convert('test.jpg'))
+
+	# opening from file handle
+	with open("name.pdf","wb") as f1, open("test.jpg") as f2:
+		f1.write(img2pdf.convert(f2))
+
+	# opening using pathlib
+	with open("name.pdf","wb") as f:
+		f.write(img2pdf.convert(pathlib.Path('test.jpg')))
+
+	# using in-memory image data
+	with open("name.pdf","wb") as f:
+		f.write(img2pdf.convert("\x89PNG...")
+
+	# multiple inputs (variant 1)
+	with open("name.pdf","wb") as f:
+		f.write(img2pdf.convert("test1.jpg", "test2.png"))
+
+	# multiple inputs (variant 2)
+	with open("name.pdf","wb") as f:
+		f.write(img2pdf.convert(["test1.jpg", "test2.png"]))
+
+	# convert all files ending in .jpg inside a directory
+	dirname = "/path/to/images"
+	imgs = []
+	for fname in os.listdir(dirname):
+		if not fname.endswith(".jpg"):
+			continue
+		path = os.path.join(dirname, fname)
+		if os.path.isdir(path):
+			continue
+		imgs.append(path)
+	with open("name.pdf","wb") as f:
+		f.write(img2pdf.convert(imgs))
+
+	# convert all files ending in .jpg in a directory and its subdirectories
+	dirname = "/path/to/images"
+	imgs = []
+	for r, _, f in os.walk(dirname):
+		for fname in f:
+			if not fname.endswith(".jpg"):
+				continue
+			imgs.append(os.path.join(r, fname))
+	with open("name.pdf","wb") as f:
+		f.write(img2pdf.convert(imgs))
+
+
+	# convert all files matching a glob
+	import glob
+	with open("name.pdf","wb") as f:
+		f.write(img2pdf.convert(glob.glob("/path/to/*.jpg")))
+
+	# convert all files matching a glob using pathlib.Path
+	from pathlib import Path
+	with open("name.pdf","wb") as f:
+		f.write(img2pdf.convert(*Path("/path").glob("**/*.jpg")))
+
+	# ignore invalid rotation values in the input images
+	with open("name.pdf","wb") as f:
+		f.write(img2pdf.convert('test.jpg'), rotation=img2pdf.Rotation.ifvalid)
+
+	# writing to file descriptor
+	with open("name.pdf","wb") as f1, open("test.jpg") as f2:
+		img2pdf.convert(f2, outputstream=f1)
+
+	# specify paper size (A4)
+	a4inpt = (img2pdf.mm_to_pt(210),img2pdf.mm_to_pt(297))
+	layout_fun = img2pdf.get_layout_fun(a4inpt)
+	with open("name.pdf","wb") as f:
+		f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
+
+	# use a fixed dpi of 300 instead of reading it from the image
+	dpix = dpiy = 300
+	layout_fun = img2pdf.get_fixed_dpi_layout_fun((dpix, dpiy))
+	with open("name.pdf","wb") as f:
+		f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
+
+	# create a PDF/A-1b compliant document by passing an ICC profile
+	with open("name.pdf","wb") as f:
+		f.write(img2pdf.convert('test.jpg', pdfa="/usr/share/color/icc/sRGB.icc"))
+
+Comparison to ImageMagick
+-------------------------
+
+Create a large test image:
+
+	$ convert logo: -resize 8000x original.jpg
+
+Convert it into PDF using ImageMagick and img2pdf:
+
+	$ time img2pdf original.jpg -o img2pdf.pdf
+	$ time convert original.jpg imagemagick.pdf
+
+Notice how ImageMagick took an order of magnitude longer to do the conversion
+than img2pdf. It also used twice the memory.
+
+Now extract the image data from both PDF documents and compare it to the
+original:
+
+	$ pdfimages -all img2pdf.pdf tmp
+	$ compare -metric AE original.jpg tmp-000.jpg null:
+	0
+	$ pdfimages -all imagemagick.pdf tmp
+	$ compare -metric AE original.jpg tmp-000.jpg null:
+	118716
+
+To get lossless output with ImageMagick we can use Zip compression but that
+unnecessarily increases the size of the output:
+
+	$ convert original.jpg -compress Zip imagemagick.pdf
+	$ pdfimages -all imagemagick.pdf tmp
+	$ compare -metric AE original.jpg tmp-000.png null:
+	0
+	$ stat --format="%s %n" original.jpg img2pdf.pdf imagemagick.pdf
+	1535837 original.jpg
+	1536683 img2pdf.pdf
+	9397809 imagemagick.pdf
+
+Comparison to pdfLaTeX
+----------------------
+
+pdfLaTeX performs a lossless conversion from included images to PDF by default.
+If the input is a JPEG, then it simply embeds the JPEG into the PDF in the same
+way as img2pdf does it. But for other image formats it uses flate compression
+of the plain pixel data and thus needlessly increases the output file size:
+
+	$ convert logo: -resize 8000x original.png
+	$ cat << END > pdflatex.tex
+	\documentclass{article}
+	\usepackage{graphicx}
+	\begin{document}
+	\includegraphics{original.png}
+	\end{document}
+	END
+	$ pdflatex pdflatex.tex
+	$ stat --format="%s %n" original.png pdflatex.pdf
+	4500182 original.png
+	9318120 pdflatex.pdf
+
+Comparison to podofoimg2pdf
+---------------------------
+
+Like pdfLaTeX, podofoimg2pdf is able to perform a lossless conversion from JPEG
+to PDF by plainly embedding the JPEG data into the pdf container. But just like
+pdfLaTeX it uses flate compression for all other file formats, thus sometimes
+resulting in larger files than necessary.
+
+	$ convert logo: -resize 8000x original.png
+	$ podofoimg2pdf out.pdf original.png
+	stat --format="%s %n" original.png out.pdf
+	4500181 original.png
+	9335629 out.pdf
+
+It also only supports JPEG, PNG and TIF as input and lacks many of the
+convenience features of img2pdf like page sizes, borders, rotation and
+metadata.
+
+Comparison to Tesseract OCR
+---------------------------
+
+Tesseract OCR comes closest to the functionality img2pdf provides. It is able
+to convert JPEG and PNG input to PDF without needlessly increasing the filesize
+and is at the same time lossless. So if your input is JPEG and PNG images, then
+you should safely be able to use Tesseract instead of img2pdf. For other input,
+Tesseract might not do a lossless conversion. For example it converts CMYK
+input to RGB and removes the alpha channel from images with transparency. For
+multipage TIFF or animated GIF, it will only convert the first frame.
+
+Comparison to econvert from ExactImage
+--------------------------------------
+
+Like pdflatex and podofoimg2pf, econvert is able to embed JPEG images into PDF
+directly without re-encoding but when given other file formats, it stores them
+just using flate compressen, which unnecessarily increases the filesize.
+Furthermore, it throws an error with CMYK TIF input. It also doesn't store CMYK
+jpeg files as CMYK but converts them to RGB, so it's not lossless. When trying
+to feed it 16bit files, it errors out with Unhandled bps/spp combination. It
+also seems to choose JPEG encoding when using it on some file types (like
+palette images) making it again not lossless for that input as well.
--- a/appveyor.yml
+++ b/appveyor.yml
@ -0,0 +1,33 @@
+environment:
+  # For Python versions available on Appveyor, see
+  # https://www.appveyor.com/docs/windows-images-software/#python
+  matrix:
+#    - PYTHON: "C:\\Python27"
+#    - PYTHON: "C:\\Python33"
+#    - PYTHON: "C:\\Python34"
+#    - PYTHON: "C:\\Python35"
+#    - PYTHON: "C:\\Python36"
+#    - PYTHON: "C:\\Python37"
+#    - PYTHON: "C:\\Python27-x64"
+#    - PYTHON: "C:\\Python33-x64"
+#    - PYTHON: "C:\\Python34-x64"
+#    - PYTHON: "C:\\Python35-x64"
+#    - PYTHON: "C:\\Python36-x64"
+    - PYTHON: "C:\\Python37-x64"
+
+install:
+  - "%PYTHON%\\python.exe -m pip install tox wheel pyinstaller Pillow"
+
+build: off
+
+# don't run tests on windows because we don't have imagemagick
+#test_script:
+#  - "%PYTHON%\\python.exe -m tox"
+
+after_test:
+  - "%PYTHON%\\python.exe setup.py bdist_wheel"
+  - "%PYTHON%\\python.exe -m PyInstaller --clean --onefile --console --nowindowed --name img2pdf src/img2pdf.py"
+  #- "%PYTHON%\\python.exe -m PyInstaller --clean --onefile --noconsole --windowed --name img2pdf_windowed src/img2pdf.py"
+
+artifacts:
+  - path: dist\*
--- a/screenshot.png
+++ b/screenshot.png
--- a/setup.cfg
+++ b/setup.cfg
@ -1,2 +0,0 @@
-[metadata]
-description-file = README.md
--- a/setup.py
+++ b/setup.py
@ -1,42 +1,51 @@
+import sys
 from setuptools import setup

-VERSION="0.1.6~git"
+VERSION = "0.6.1"

-setup (
-    name='img2pdf',
+INSTALL_REQUIRES = (
+    "Pillow",
+    "pikepdf",
+)
+
+setup(
+    name="img2pdf",
    version=VERSION,
-    author = "Johannes 'josch' Schauer",
-    author_email = 'j.schauer@email.de',
-    description = "Convert images to PDF via direct JPEG inclusion.",
-    long_description = open('README.md').read(),
-    license = "LGPL",
-    keywords = "jpeg pdf converter",
-    classifiers = [
-        'Development Status :: 4 - Beta',
-        'Intended Audience :: Developers',
-        'Programming Language :: Python',
-        'Programming Language :: Python :: 2',
-        'Programming Language :: Python :: 2.6',
-        'Programming Language :: Python :: 2.7',
-        'Programming Language :: Python :: 3',
-        'Programming Language :: Python :: 3.4',
-        'Programming Language :: Python :: Implementation :: CPython',
-        'License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)',
-        'Programming Language :: Python',
-        'Natural Language :: English',
-        'Operating System :: OS Independent'],
-    url = 'https://github.com/josch/img2pdf',
-    download_url = 'https://github.com/josch/img2pdf/archive/'+VERSION+'.tar.gz',
+    author="Johannes Schauer Marin Rodrigues",
+    author_email="josch@mister-muffin.de",
+    description="Convert images to PDF via direct JPEG inclusion.",
+    long_description=open("README.md").read(),
+    long_description_content_type="text/markdown",
+    license="LGPL",
+    keywords="jpeg pdf converter",
+    classifiers=[
+        "Development Status :: 5 - Production/Stable",
+        "Intended Audience :: Developers",
+        "Intended Audience :: Other Audience",
+        "Environment :: Console",
+        "Programming Language :: Python",
+        "Programming Language :: Python :: 3",
+        "Programming Language :: Python :: 3.5",
+        "Programming Language :: Python :: Implementation :: CPython",
+        "Programming Language :: Python :: Implementation :: PyPy",
+        "License :: OSI Approved :: GNU Lesser General Public License v3 " "(LGPLv3)",
+        "Natural Language :: English",
+        "Operating System :: OS Independent",
+    ],
+    url="https://gitlab.mister-muffin.de/josch/img2pdf",
+    download_url="https://gitlab.mister-muffin.de/josch/img2pdf/repository/"
+    "archive.tar.gz?ref=" + VERSION,
    package_dir={"": "src"},
-    py_modules=['img2pdf', 'jp2'],
-    include_package_data = True,
-    test_suite = 'tests.test_suite',
-    zip_safe = True,
-    install_requires=(
-        'Pillow',
-    ),
-    entry_points='''
-    [console_scripts]
-    img2pdf = img2pdf:main
-    ''',
-    )
+    py_modules=["img2pdf", "jp2"],
+    include_package_data=True,
+    zip_safe=True,
+    install_requires=INSTALL_REQUIRES,
+    extras_require={
+        "gui": ("tkinter"),
+    },
+    entry_points={
+        "setuptools.installation": ["eggsecutable = img2pdf:main"],
+        "console_scripts": ["img2pdf = img2pdf:main"],
+        "gui_scripts": ["img2pdf-gui = img2pdf:gui"],
+    },
+)
--- a/src/img2pdf.py
+++ b/src/img2pdf.py
--- a/src/img2pdf_test.py
+++ b/src/img2pdf_test.py
--- a/src/jp2.py
+++ b/src/jp2.py
@ -1,6 +1,6 @@
 #!/usr/bin/env python
 #
-# Copyright (C) 2013 Johannes 'josch' Schauer <j.schauer at email.de>
+# Copyright (C) 2013 Johannes Schauer Marin Rodrigues <j.schauer at email.de>
 #
 # this module is heavily based upon jpylyzer which is
 # KB / National Library of the Netherlands, Open Planets Foundation
@ -21,23 +21,25 @@

 import struct

+
 def getBox(data, byteStart, noBytes):
-    boxLengthValue = struct.unpack(">I", data[byteStart:byteStart+4])[0]
-    boxType = data[byteStart+4:byteStart+8]
+    boxLengthValue = struct.unpack(">I", data[byteStart : byteStart + 4])[0]
+    boxType = data[byteStart + 4 : byteStart + 8]
    contentsStartOffset = 8
    if boxLengthValue == 1:
-        boxLengthValue = struct.unpack(">Q", data[byteStart+8:byteStart+16])[0]
+        boxLengthValue = struct.unpack(">Q", data[byteStart + 8 : byteStart + 16])[0]
        contentsStartOffset = 16
    if boxLengthValue == 0:
-        boxLengthValue = noBytes-byteStart
+        boxLengthValue = noBytes - byteStart
    byteEnd = byteStart + boxLengthValue
-    boxContents = data[byteStart+contentsStartOffset:byteEnd]
+    boxContents = data[byteStart + contentsStartOffset : byteEnd]
    return (boxLengthValue, boxType, byteEnd, boxContents)

+
 def parse_ihdr(data):
-    height = struct.unpack(">I", data[0:4])[0]
-    width = struct.unpack(">I", data[4:8])[0]
-    return width, height
+    height, width, channels, bpp = struct.unpack(">IIHB", data[:11])
+    return width, height, channels, bpp + 1
+

 def parse_colr(data):
    meth = struct.unpack(">B", data[0:1])[0]
@ -49,30 +51,60 @@ def parse_colr(data):
    elif enumCS == 17:
        return "L"
    else:
-        raise Exception("only sRGB and greyscale color space is supported, got %d"%enumCS)
+        raise Exception(
+            "only sRGB and greyscale color space is supported, " "got %d" % enumCS
+        )
+
+
+def parse_resc(data):
+    hnum, hden, vnum, vden, hexp, vexp = struct.unpack(">HHHHBB", data)
+    hdpi = ((hnum / hden) * (10**hexp) * 100) / 2.54
+    vdpi = ((vnum / vden) * (10**vexp) * 100) / 2.54
+    return hdpi, vdpi
+
+
+def parse_res(data):
+    hdpi, vdpi = None, None
+    noBytes = len(data)
+    byteStart = 0
+    boxLengthValue = 1  # dummy value for while loop condition
+    while byteStart < noBytes and boxLengthValue != 0:
+        boxLengthValue, boxType, byteEnd, boxContents = getBox(data, byteStart, noBytes)
+        if boxType == b"resc":
+            hdpi, vdpi = parse_resc(boxContents)
+            break
+    return hdpi, vdpi
+

 def parse_jp2h(data):
-    width, height, colorspace = None, None, None
-    noBytes=len(data)
-    byteStart=0
-    boxLengthValue=1 # dummy value for while loop condition
+    width, height, colorspace, hdpi, vdpi = None, None, None, None, None
+    noBytes = len(data)
+    byteStart = 0
+    boxLengthValue = 1  # dummy value for while loop condition
    while byteStart < noBytes and boxLengthValue != 0:
        boxLengthValue, boxType, byteEnd, boxContents = getBox(data, byteStart, noBytes)
-        if boxType == 'ihdr':
-            width, height = parse_ihdr(boxContents)
-        elif boxType == 'colr':
+        if boxType == b"ihdr":
+            width, height, channels, bpp = parse_ihdr(boxContents)
+        elif boxType == b"colr":
            colorspace = parse_colr(boxContents)
+        elif boxType == b"res ":
+            hdpi, vdpi = parse_res(boxContents)
        byteStart = byteEnd
-    return (width, height, colorspace)
+    return (width, height, colorspace, hdpi, vdpi, channels, bpp)
+

 def parsejp2(data):
-    noBytes=len(data)
-    byteStart=0
-    boxLengthValue=1 # dummy value for while loop condition
+    noBytes = len(data)
+    byteStart = 0
+    boxLengthValue = 1  # dummy value for while loop condition
+    width, height, colorspace, hdpi, vdpi = None, None, None, None, None
    while byteStart < noBytes and boxLengthValue != 0:
        boxLengthValue, boxType, byteEnd, boxContents = getBox(data, byteStart, noBytes)
-        if boxType == 'jp2h':
-            width, height, colorspace = parse_jp2h(boxContents)
+        if boxType == b"jp2h":
+            width, height, colorspace, hdpi, vdpi, channels, bpp = parse_jp2h(
+                boxContents
+            )
+            break
        byteStart = byteEnd
    if not width:
        raise Exception("no width in jp2 header")
@ -80,11 +112,42 @@ def parsejp2(data):
        raise Exception("no height in jp2 header")
    if not colorspace:
        raise Exception("no colorspace in jp2 header")
-    return (width, height, colorspace)
+    # retrieving the dpi is optional so we do not error out if not present
+    return (width, height, colorspace, hdpi, vdpi, channels, bpp)
+
+
+def parsej2k(data):
+    lsiz, rsiz, xsiz, ysiz, xosiz, yosiz, _, _, _, _, csiz = struct.unpack(
+        ">HHIIIIIIIIH", data[4:42]
+    )
+    ssiz = [None] * csiz
+    xrsiz = [None] * csiz
+    yrsiz = [None] * csiz
+    for i in range(csiz):
+        ssiz[i], xrsiz[i], yrsiz[i] = struct.unpack(
+            "BBB", data[42 + 3 * i : 42 + 3 * (i + 1)]
+        )
+    assert ssiz == [7, 7, 7]
+    return xsiz - xosiz, ysiz - yosiz, None, None, None, csiz, 8
+
+
+def parse(data):
+    if data[:4] == b"\xff\x4f\xff\x51":
+        return parsej2k(data)
+    else:
+        return parsejp2(data)
+

 if __name__ == "__main__":
    import sys
-    width, height, colorspace = parsejp2(open(sys.argv[1]).read())
-    sys.stdout.write("width = %d"%width)
-    sys.stdout.write("height = %d"%height)
-    sys.stdout.write("colorspace = %s"%colorspace)
+
+    width, height, colorspace, hdpi, vdpi, channels, bpp = parse(
+        open(sys.argv[1], "rb").read()
+    )
+    print("width = %d" % width)
+    print("height = %d" % height)
+    print("colorspace = %s" % colorspace)
+    print("hdpi = %s" % hdpi)
+    print("vdpi = %s" % vdpi)
+    print("channels = %s" % channels)
+    print("bpp = %s" % bpp)
--- a/src/tests/init.py
+++ b/src/tests/init.py
@ -1,109 +0,0 @@
-import unittest
-
-import os
-import img2pdf
-import zlib
-from PIL import Image
-
-HERE = os.path.dirname(__file__)
-
-#convert +set date:create +set date:modify -define png:exclude-chunk=time
-
-def test_suite():
-    class TestImg2Pdf(unittest.TestCase):
-        pass
-
-    for test_name in os.listdir(os.path.join(HERE, "input")):
-        inputf = os.path.join(HERE, "input", test_name)
-        if not os.path.isfile(inputf):
-            continue
-        outputf = os.path.join(HERE, "output", test_name+".pdf")
-        assert os.path.isfile(outputf)
-        def handle(self, f=inputf, out=outputf):
-            with open(f, "rb") as inf:
-                orig_imgdata = inf.read()
-            pdf = img2pdf.convert([f], nodate=True)
-            imgdata = b""
-            instream = False
-            imgobj = False
-            colorspace = None
-            imgfilter = None
-            width = None
-            height = None
-            length = None
-            # ugly workaround to parse the created pdf
-            for line in pdf.split(b'\n'):
-                if instream:
-                    if line == b"endstream":
-                        break
-                    else:
-                        imgdata += line + b'\n'
-                else:
-                    if imgobj and line == b"stream":
-                        instream = True
-                    elif b"/Subtype /Image" in line:
-                        imgobj = True
-                    elif b"/Width" in line:
-                        width = int(line.split()[-1])
-                    elif b"/Height" in line:
-                        height = int(line.split()[-1])
-                    elif b"/Length" in line:
-                        length = int(line.split()[-1])
-                    elif b"/Filter" in line:
-                        imgfilter = line.split()[-2]
-                    elif b"/ColorSpace" in line:
-                        colorspace = line.split()[-1]
-            # remove trailing \n
-            imgdata = imgdata[:-1]
-            # test if the length field is correct
-            self.assertEqual(len(imgdata), length)
-            # test if the filter is valid:
-            self.assertIn(imgfilter, [b"/DCTDecode", b"/JPXDecode", b"/FlateDecode"])
-            # test if the colorspace is valid
-            self.assertIn(colorspace, [b"/DeviceGray", b"/DeviceRGB", b"/DeviceCMYK"])
-            # test if the image has correct size
-            orig_img = Image.open(f)
-            self.assertEqual(width, orig_img.size[0])
-            self.assertEqual(height, orig_img.size[1])
-            # if the input file is a jpeg then it should've been copied
-            # verbatim into the PDF
-            if imgfilter in [b"/DCTDecode", b"/JPXDecode"]:
-                self.assertEqual(imgdata, orig_imgdata)
-            elif imgfilter == b"/FlateDecode":
-                # otherwise, the data is flate encoded and has to be equal to
-                # the pixel data of the input image
-                imgdata = zlib.decompress(imgdata)
-                if colorspace == b"/DeviceGray":
-                    colorspace = 'L'
-                elif colorspace == b"/DeviceRGB":
-                    colorspace = 'RGB'
-                elif colorspace == b"/DeviceCMYK":
-                    colorspace = 'CMYK'
-                else:
-                    raise Exception("invalid colorspace")
-                im = Image.frombytes(colorspace, (width, height), imgdata)
-                if orig_img.mode == '1':
-                    orig_img = orig_img.convert("L")
-                elif orig_img.mode not in ("RGB", "L", "CMYK", "CMYK;I"):
-                    orig_img = orig_img.convert("RGB")
-                self.assertEqual(im.tobytes(), orig_img.tobytes())
-                # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the close() method
-                try:
-                    im.close()
-                except AttributeError:
-                    pass
-            # lastly, make sure that the generated pdf matches bit by bit the
-            # expected pdf
-            with open(out, "rb") as outf:
-                out = outf.read()
-            self.assertEqual(pdf, out)
-            # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the close() method
-            try:
-                orig_img.close()
-            except AttributeError:
-                pass
-        setattr(TestImg2Pdf, "test_%s"%test_name, handle)
-
-    return unittest.TestSuite((
-            unittest.makeSuite(TestImg2Pdf),
-            ))
--- a/src/tests/input/animation.gif
+++ b/src/tests/input/animation.gif
--- a/src/tests/input/gray.png
+++ b/src/tests/input/gray.png
--- a/src/tests/input/mono.jb2
+++ b/src/tests/input/mono.jb2
--- a/src/tests/input/mono.png
+++ b/src/tests/input/mono.png
--- a/src/tests/input/mono.tif
+++ b/src/tests/input/mono.tif
--- a/src/tests/input/normal.png
+++ b/src/tests/input/normal.png
--- a/src/tests/output/CMYK.jpg.pdf
+++ b/src/tests/output/CMYK.jpg.pdf
--- a/src/tests/output/CMYK.tif.pdf
+++ b/src/tests/output/CMYK.tif.pdf
--- a/src/tests/output/animation.gif.pdf
+++ b/src/tests/output/animation.gif.pdf
@ -0,0 +1,138 @@
+%PDF-1.3
+%âãÏÓ
+1 0 obj
+<<
+    /Producer (img2pdf 0.4.3)
+>>
+endobj
+2 0 obj
+<<
+    /Pages 3 0 R
+    /Type /Catalog
+>>
+endobj
+3 0 obj
+<<
+    /Count 2
+    /Kids [ 4 0 R 7 0 R ]
+    /Type /Pages
+>>
+endobj
+4 0 obj
+<<
+    /Contents 5 0 R
+    /MediaBox [ 0 0 86.25 36 ]
+    /Parent 3 0 R
+    /Resources <<
+        /XObject <<
+            /Im0 6 0 R
+        >>
+    >>
+    /Type /Page
+>>
+endobj
+5 0 obj
+<<
+    /Length 48
+>>
+stream
+q
+86.2500 0 0 36.0000 0.0000 0.0000 cm
+/Im0 Do
+Q
+endstream
+endobj
+6 0 obj
+<<
+    /BitsPerComponent 8
+    /ColorSpace [ /Indexed /DeviceRGB 255 < 000000 010101 030303 040404 060606 070707 080808 090909 0a0a0a 0b0b0b 0c0c0c 131313 141414 161616 171717 181818 1b1b1b 202020 212121 2d2d2d 2f2f2f 303030 333333 3b3b3b 3d3d3d 3e3e3e 3f3f3f 474747 4c4c4c 505050 5e5e5e 5f5f5f 626262 6a6a6a 747474 757575 8a8a8a 8b8b8b 959595 9d9d9d a0a0a0 a1a1a1 afafaf b3b3b3 b8b8b8 c0c0c0 c1c1c1 c2c2c2 c4c4c4 cccccc cfcfcf d0d0d0 d2d2d2 dedede dfdfdf e4e4e4 e7e7e7 e8e8e8 e9e9e9 ebebeb ececec f3f3f3 f4f4f4 f5f5f5 f6f6f6 f7f7f7 f8f8f8 f9f9f9 fbfbfb fcfcfc fefefe ffffff 050505 0e0e0e 191919 1e1e1e 222222 252525 272727 2b2b2b 2e2e2e 414141 424242 454545 525252 555555 575757 595959 5b5b5b 5c5c5c 666666 707070 717171 727272 7d7d7d 818181 939393 979797 9e9e9e a2a2a2 a5a5a5 a7a7a7 ababab b2b2b2 b4b4b4 bfbfbf c5c5c5 cdcdcd d3d3d3 d5d5d5 d6d6d6 dcdcdc e0e0e0 e2e2e2 eaeaea efefef f0f0f0 f2f2f2 0d0d0d 0f0f0f 101010 151515 1d1d1d 1f1f1f 232323 292929 2a2a2a 2c2c2c 323232 3a3a3a 404040 4b4b4b 4d4d4d 545454 585858 5a5a5a 5d5d5d 616161 686868 6c6c6c 7e7e7e 828282 8d8d8d 8e8e8e 8f8f8f 999999 a3a3a3 a4a4a4 a6a6a6 a8a8a8 aaaaaa adadad bababa bdbdbd bebebe d1d1d1 d4d4d4 d8d8d8 dadada dddddd e1e1e1 e6e6e6 f1f1f1 fafafa 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 > ]
+    /DecodeParms <<
+        /BitsPerComponent 8
+        /Colors 1
+        /Columns 115
+        /Predictor 15
+    >>
+    /Filter /FlateDecode
+    /Height 48
+    /Length 392
+    /Subtype /Image
+    /Type /XObject
+    /Width 115
+>>
+stream
+xœíÖ×VÂ@àQ)
+vÅŠˆEÅ
+ÂŠTlØ°¬}'9œŽ‰D“ÌUæŠ<C3A6>–/›‹$„âÑMÝÔMÝTÕ{3©:cµñÈ3—oÉ’JùýóÝU‚¶ýt²üÿ)5s6¹;T3ŸðÛrì_Œ0<C592>ó¦¿˜Â{-8„t«P1D!~=Þi<C39E>³Y‘yIrYØyn×ÚÜ‡äF<ÒÌtB2ƒlš YE6k ™E6<1D>\#›‡<E280BA>8~?¨êæ¬hcŠ?†è·2»=<3D><švñ-Ç<>w
+Å¤K=•ÅnE‹_MSä…zë«\L:ŒaR^d\kÔ9‡brË¦8tÏdëž»ËÃ¨&eì0Ú…kÒKÝB6µ ›Q5#›!ÝD6¯`ÔÙöXºÎoÀ¨O[“<>æÁ…âòu&“QMBLîþ‘é¯ØgÐ·g`—¹ò›ÿ÷^!RIí¨ŒVd¾dJº‰“¯nÕM¨<4D>»ÞŽ¤Õh´:ÓgC")rLŒÒMÝTZßÈñ
p
+endstream
+endobj
+7 0 obj
+<<
+    /Contents 8 0 R
+    /MediaBox [ 0 0 86.25 36 ]
+    /Parent 3 0 R
+    /Resources <<
+        /XObject <<
+            /Im0 9 0 R
+        >>
+    >>
+    /Type /Page
+>>
+endobj
+8 0 obj
+<<
+    /Length 48
+>>
+stream
+q
+86.2500 0 0 36.0000 0.0000 0.0000 cm
+/Im0 Do
+Q
+endstream
+endobj
+9 0 obj
+<<
+    /BitsPerComponent 8
+    /ColorSpace [ /Indexed /DeviceRGB 255 < 000000 010101 030303 040404 060606 070707 080808 090909 0a0a0a 0b0b0b 0c0c0c 131313 141414 161616 171717 181818 1b1b1b 202020 212121 2d2d2d 2f2f2f 303030 333333 3b3b3b 3d3d3d 3e3e3e 3f3f3f 474747 4c4c4c 505050 5e5e5e 5f5f5f 626262 6a6a6a 747474 757575 8a8a8a 8b8b8b 959595 9d9d9d a0a0a0 a1a1a1 afafaf b3b3b3 b8b8b8 c0c0c0 c1c1c1 c2c2c2 c4c4c4 cccccc cfcfcf d0d0d0 d2d2d2 dedede dfdfdf e4e4e4 e7e7e7 e8e8e8 e9e9e9 ebebeb ececec f3f3f3 f4f4f4 f5f5f5 f6f6f6 f7f7f7 f8f8f8 f9f9f9 fbfbfb fcfcfc fefefe ffffff 050505 0e0e0e 191919 1e1e1e 222222 252525 272727 2b2b2b 2e2e2e 414141 424242 454545 525252 555555 575757 595959 5b5b5b 5c5c5c 666666 707070 717171 727272 7d7d7d 818181 939393 979797 9e9e9e a2a2a2 a5a5a5 a7a7a7 ababab b2b2b2 b4b4b4 bfbfbf c5c5c5 cdcdcd d3d3d3 d5d5d5 d6d6d6 dcdcdc e0e0e0 e2e2e2 eaeaea efefef f0f0f0 f2f2f2 0d0d0d 0f0f0f 101010 151515 1d1d1d 1f1f1f 232323 292929 2a2a2a 2c2c2c 323232 3a3a3a 404040 4b4b4b 4d4d4d 545454 585858 5a5a5a 5d5d5d 616161 686868 6c6c6c 7e7e7e 828282 8d8d8d 8e8e8e 8f8f8f 999999 a3a3a3 a4a4a4 a6a6a6 a8a8a8 aaaaaa adadad bababa bdbdbd bebebe d1d1d1 d4d4d4 d8d8d8 dadada dddddd e1e1e1 e6e6e6 f1f1f1 fafafa 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 > ]
+    /DecodeParms <<
+        /BitsPerComponent 8
+        /Colors 1
+        /Columns 115
+        /Predictor 15
+    >>
+    /Filter /FlateDecode
+    /Height 48
+    /Length 387
+    /Subtype /Image
+    /Type /XObject
+    /Width 115
+>>
+stream
+xœí–çRÂP…/VT+"
+vì
+JQ#æ<>cÅ^°aC¾¨›š1’<1D>Éù•Ýsö~7w2IQ¥J•ªÿ.ƒ-–˜§–¼3wöV<C3B6>ô1ZPB.?Z7´œÕö\·ˆ¯W.3â•:!s/×þ[Ëf<C38B>ž1_NÃÌ;fñVZZ'XF¾n	I.‘yN²*»³;¦4óœ“¢×+<2B>é§™§™¹N2Ó
Î)2ó÷ï7*;ó€µ>{+•a
+¼N^¸*à°›ÃhLƒ“o¹lÍ(LR=¸’é®NZäd
+lõÏem&nÀ`¢1]Äœ_i¨«…ÉIßMpÐg<&«Cî”»P™„ñ@t—IÎ!ú†Ìd êGfê @f!úŽÌ<ƒ¨CYæø~nþ€¨UY&MOõÔfÊíWH&uJ3i:èèëlZÓ®›¬O!¶ŒHeæKÈYè<59>¼á™;ÑÙœnê¦àÓ-;´q40šôQ”Ï¿êþá'¥ˆ‡^•ª?oíœ
+endstream
+endobj
+xref
+0 10
+0000000000 65535 f 
+0000000015 00000 n 
+0000000066 00000 n 
+0000000123 00000 n 
+0000000200 00000 n 
+0000000385 00000 n 
+0000000487 00000 n 
+0000003008 00000 n 
+0000003193 00000 n 
+0000003295 00000 n 
+trailer
+<<
+    /Info 1 0 R
+    /Root 2 0 R
+    /Size 10
+>>
+startxref
+5811
+%%EOF
--- a/src/tests/output/gray.png.pdf
+++ b/src/tests/output/gray.png.pdf
--- a/src/tests/output/mono.jb2.pdf
+++ b/src/tests/output/mono.jb2.pdf
--- a/src/tests/output/mono.png.pdf
+++ b/src/tests/output/mono.png.pdf
--- a/src/tests/output/mono.tif.pdf
+++ b/src/tests/output/mono.tif.pdf
--- a/src/tests/output/normal.jpg.pdf
+++ b/src/tests/output/normal.jpg.pdf
--- a/src/tests/output/normal.png.pdf
+++ b/src/tests/output/normal.png.pdf
--- a/test_comp.sh
+++ b/test_comp.sh
@ -16,17 +16,17 @@ for a in `convert -list compress`; do
 	echo "encode:\t$a"
 	convert "$1" -compress $a "`basename $1 .jpg`.pdf"
 	pdfimages "`basename $1 .jpg`.pdf" "`basename $1 .jpg`"
-	/bin/echo -ne "diff:\t"
+	printf "diff:\t"
 	diff=`compare -metric AE "$1" "\`basename $1 .jpg\`-000.ppm" null: 2>&1`
 	if [ "$diff" != "0" ]; then
 		echo "lossy"
 	else
 		echo "lossless"
 	fi
-	/bin/echo -ne "size:\t"
+	printf "size:\t"
 	pdfsize=`stat -c "%s" "\`basename $1 .jpg\`.pdf"`
 	echo "scale=1;$pdfsize/$imsize" | bc
-	/bin/echo -ne "pdf:\t"
+	printf "pdf:\t"
 	grep --max-count=1 --text /Filter "`basename $1 .jpg`.pdf"
 	echo
 done
--- a/tox.ini
+++ b/tox.ini
@ -0,0 +1,18 @@
+# tox (https://tox.readthedocs.io/) is a tool for running tests
+# in multiple virtualenvs. This configuration file will run the
+# test suite on all supported python versions. To use it, "pip install tox"
+# and then run "tox" from this directory.
+
+[tox]
+envlist = py37, py38, py39, py310
+skip_missing_interpreters = true
+
+[testenv]
+deps =
+    pdfrw
+    pytest
+    pikepdf
+    numpy
+    scipy
+commands =
+    python -m pytest -vv