|
|
|
@ -1,5 +1,5 @@
|
|
|
|
|
[![Travis Status](https://travis-ci.com/josch/img2pdf.svg?branch=main)](https://app.travis-ci.com/josch/img2pdf)
|
|
|
|
|
[![Appveyor Status](https://ci.appveyor.com/api/projects/status/2kws3wkqvi526llj/branch/main?svg=true)](https://ci.appveyor.com/project/josch/img2pdf/branch/main)
|
|
|
|
|
[![Travis Status](https://travis-ci.org/josch/img2pdf.svg?branch=master)](https://travis-ci.org/josch/img2pdf)
|
|
|
|
|
[![Appveyor Status](https://ci.appveyor.com/api/projects/status/2kws3wkqvi526llj/branch/master?svg=true)](https://ci.appveyor.com/project/josch/img2pdf/branch/master)
|
|
|
|
|
|
|
|
|
|
img2pdf
|
|
|
|
|
=======
|
|
|
|
@ -27,15 +27,15 @@ software, because the raw pixel data never has to be loaded into memory.
|
|
|
|
|
The following table shows how img2pdf handles different input depending on the
|
|
|
|
|
input file format and image color space.
|
|
|
|
|
|
|
|
|
|
| Format | Colorspace | Result |
|
|
|
|
|
| ------------------------------------- | ------------------------------ | ------------- |
|
|
|
|
|
| JPEG | any | direct |
|
|
|
|
|
| JPEG2000 | any | direct |
|
|
|
|
|
| PNG (non-interlaced, no transparency) | any | direct |
|
|
|
|
|
| TIFF (CCITT Group 4) | monochrome | direct |
|
|
|
|
|
| any | any except CMYK and monochrome | PNG Paeth |
|
|
|
|
|
| any | monochrome | CCITT Group 4 |
|
|
|
|
|
| any | CMYK | flate |
|
|
|
|
|
| Format | Colorspace | Result |
|
|
|
|
|
| -------------------- | ------------------------------ | ------------- |
|
|
|
|
|
| JPEG | any | direct |
|
|
|
|
|
| JPEG2000 | any | direct |
|
|
|
|
|
| PNG (non-interlaced) | any | direct |
|
|
|
|
|
| TIFF (CCITT Group 4) | monochrome | direct |
|
|
|
|
|
| any | any except CMYK and monochrome | PNG Paeth |
|
|
|
|
|
| any | monochrome | CCITT Group 4 |
|
|
|
|
|
| any | CMYK | flate |
|
|
|
|
|
|
|
|
|
|
For JPEG, JPEG2000, non-interlaced PNG and TIFF images with CCITT Group 4
|
|
|
|
|
encoded data, img2pdf directly embeds the image data into the PDF without
|
|
|
|
@ -72,15 +72,25 @@ Bugs
|
|
|
|
|
when embedded into the PDF cannot be read by the Adobe Acrobat Reader,
|
|
|
|
|
please contact me.
|
|
|
|
|
|
|
|
|
|
- I have not yet figured out how to determine the colorspace of JPEG2000
|
|
|
|
|
files. Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000
|
|
|
|
|
files with other colorspaces, you must explicitly specify it using the
|
|
|
|
|
`--colorspace` option.
|
|
|
|
|
|
|
|
|
|
- Input images with alpha channels are not allowed. PDF only supports
|
|
|
|
|
transparency using binary masks but is unable to store 8-bit transparency
|
|
|
|
|
information as part of the image itself. But img2pdf will always be lossless
|
|
|
|
|
and thus, input images must not carry transparency information. You can
|
|
|
|
|
remove the alpha channel for example with imagemagick:
|
|
|
|
|
|
|
|
|
|
convert input.png -background white -alpha remove -alpha off output.png
|
|
|
|
|
|
|
|
|
|
- An error is produced if the input image is broken. This commonly happens if
|
|
|
|
|
the input image has an invalid EXIF Orientation value of zero. Even though
|
|
|
|
|
only nine different values from 1 to 9 are permitted, Anroid phones and
|
|
|
|
|
Canon DSLR cameras produce JPEG images with the invalid value of zero.
|
|
|
|
|
Either fix your input images with `exiftool` or similar software before
|
|
|
|
|
passing the JPEG to `img2pdf` or run `img2pdf` with `--rotation=ifvalid`
|
|
|
|
|
(if you run img2pdf from the commandline) or by passing
|
|
|
|
|
`rotation=img2pdf.Rotation.ifvalid` as an argument to `convert()` when using
|
|
|
|
|
img2pdf as a library.
|
|
|
|
|
passing the JPEG to `img2pdf` or run `img2pdf` with `--rotation=ifvalid`.
|
|
|
|
|
|
|
|
|
|
- img2pdf uses PIL (or Pillow) to obtain image meta data and to convert the
|
|
|
|
|
input if necessary. To prevent decompression bomb denial of service attacks,
|
|
|
|
@ -117,9 +127,10 @@ You can then test the converter using:
|
|
|
|
|
|
|
|
|
|
$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
|
|
|
|
|
|
|
|
|
|
If you don't want to setup Python on Windows, then head to the
|
|
|
|
|
[releases](/josch/img2pdf/releases) section and download the latest
|
|
|
|
|
`img2pdf.exe`.
|
|
|
|
|
For Microsoft Windows users, PyInstaller based .exe files are produced by
|
|
|
|
|
appveyor. If you don't want to install Python before using img2pdf you can head
|
|
|
|
|
to appveyor and click on "Artifacts" to download the latest version:
|
|
|
|
|
https://ci.appveyor.com/project/josch/img2pdf
|
|
|
|
|
|
|
|
|
|
GUI
|
|
|
|
|
---
|
|
|
|
@ -146,10 +157,6 @@ The package can also be used as a library:
|
|
|
|
|
with open("name.pdf","wb") as f1, open("test.jpg") as f2:
|
|
|
|
|
f1.write(img2pdf.convert(f2))
|
|
|
|
|
|
|
|
|
|
# opening using pathlib
|
|
|
|
|
with open("name.pdf","wb") as f:
|
|
|
|
|
f.write(img2pdf.convert(pathlib.Path('test.jpg')))
|
|
|
|
|
|
|
|
|
|
# using in-memory image data
|
|
|
|
|
with open("name.pdf","wb") as f:
|
|
|
|
|
f.write(img2pdf.convert("\x89PNG...")
|
|
|
|
@ -192,15 +199,6 @@ The package can also be used as a library:
|
|
|
|
|
with open("name.pdf","wb") as f:
|
|
|
|
|
f.write(img2pdf.convert(glob.glob("/path/to/*.jpg")))
|
|
|
|
|
|
|
|
|
|
# convert all files matching a glob using pathlib.Path
|
|
|
|
|
from pathlib import Path
|
|
|
|
|
with open("name.pdf","wb") as f:
|
|
|
|
|
f.write(img2pdf.convert(*Path("/path").glob("**/*.jpg")))
|
|
|
|
|
|
|
|
|
|
# ignore invalid rotation values in the input images
|
|
|
|
|
with open("name.pdf","wb") as f:
|
|
|
|
|
f.write(img2pdf.convert('test.jpg'), rotation=img2pdf.Rotation.ifvalid)
|
|
|
|
|
|
|
|
|
|
# writing to file descriptor
|
|
|
|
|
with open("name.pdf","wb") as f1, open("test.jpg") as f2:
|
|
|
|
|
img2pdf.convert(f2, outputstream=f1)
|
|
|
|
@ -308,14 +306,3 @@ Tesseract might not do a lossless conversion. For example it converts CMYK
|
|
|
|
|
input to RGB and removes the alpha channel from images with transparency. For
|
|
|
|
|
multipage TIFF or animated GIF, it will only convert the first frame.
|
|
|
|
|
|
|
|
|
|
Comparison to econvert from ExactImage
|
|
|
|
|
--------------------------------------
|
|
|
|
|
|
|
|
|
|
Like pdflatex and podofoimg2pf, econvert is able to embed JPEG images into PDF
|
|
|
|
|
directly without re-encoding but when given other file formats, it stores them
|
|
|
|
|
just using flate compressen, which unnecessarily increases the filesize.
|
|
|
|
|
Furthermore, it throws an error with CMYK TIF input. It also doesn't store CMYK
|
|
|
|
|
jpeg files as CMYK but converts them to RGB, so it's not lossless. When trying
|
|
|
|
|
to feed it 16bit files, it errors out with Unhandled bps/spp combination. It
|
|
|
|
|
also seems to choose JPEG encoding when using it on some file types (like
|
|
|
|
|
palette images) making it again not lossless for that input as well.
|
|
|
|
|