forked from josch/img2pdf
general editing
This commit is contained in:
parent
592cdc1cdb
commit
be21c4bbf3
1 changed files with 44 additions and 46 deletions
90
README.md
90
README.md
|
@ -1,15 +1,16 @@
|
|||
img2pdf
|
||||
=======
|
||||
|
||||
Lossless conversion of images to PDF without unnecessarily re-encoding JPEG and
|
||||
JPEG2000 files. Thus, no loss of quality and no unnecessary large output file.
|
||||
Losslessly convert images to PDF without unnecessarily re-encoding JPEG and
|
||||
JPEG2000 files. Image quality is retained without unnecessarily increasing
|
||||
file size.
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
PDF is able to embed JPEG and JPEG2000 images as they are without re-encoding
|
||||
them (and hence losing quality) but I was missing a tool to do this
|
||||
automatically, thus I wrote this piece of python code.
|
||||
Quality loss can be avoided when converting JPEG and JPEG2000 images to
|
||||
PDF by embedding them without re-encoding. I wrote this piece of python code.
|
||||
because I was missing a tool to do this automatically.
|
||||
|
||||
If you know how to embed JPEG and JPEG2000 images into a PDF container without
|
||||
recompression, using existing tools, please contact me so that I can put this
|
||||
|
@ -18,43 +19,41 @@ code into the garbage bin :D
|
|||
Functionality
|
||||
-------------
|
||||
|
||||
The program will take image filenames from commandline arguments and output a
|
||||
PDF file with them embedded into it. If the input image is a JPEG or JPEG2000
|
||||
file, it will be included as-is without any processing. If it is in any other
|
||||
format, the image will be included as zip-encoded RGB. As a result, this tool
|
||||
will be able to lossless wrap any image into a PDF container while performing
|
||||
better (in terms of quality/filesize ratio) than existing tools in case the
|
||||
input image is a JPEG or JPEG2000 file.
|
||||
This program will take a list of images and produce a PDF file with the
|
||||
images embedded in it. JPEG and JPEG2000 images will be included without
|
||||
recompression. Images in other formats will be included with zip/flate
|
||||
encoding. As a result, this tool is able to losslessly wrap any image
|
||||
into a PDF container with a quality-filesize ratio that is typically better
|
||||
than that of existing tools.
|
||||
|
||||
For example, imagemagick will re-encode the input JPEG image and thus change
|
||||
its content:
|
||||
For example, imagemagick will re-encode the input JPEG image (thus changing
|
||||
its content):
|
||||
|
||||
$ convert img.jpg img.pdf
|
||||
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
|
||||
$ compare -metric AE img.jpg img.extr-000.ppm null:
|
||||
1.6301e+06
|
||||
|
||||
If one wants to do a lossless conversion from any format to PDF with
|
||||
imagemagick then one has to use zip-encoding:
|
||||
If one wants to losslessly convert from any format to PDF with
|
||||
imagemagick, one has to use zip compression:
|
||||
|
||||
$ convert input.jpg -compress Zip output.pdf
|
||||
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
|
||||
$ compare -metric AE img.jpg img.extr-000.ppm null:
|
||||
0
|
||||
|
||||
The downside is, that using imagemagick like this will make the resulting PDF
|
||||
files a few times bigger than the input JPEG or JPEG2000 file and can also not
|
||||
output a multipage PDF.
|
||||
However, this approach will result in PDF files that are a few times larger
|
||||
than the input JPEG or JPEG2000 file.
|
||||
|
||||
img2pdf is able to output a PDF with multiple pages if more than one input
|
||||
image is given, losslessly embed JPEG and JPEG2000 files into a PDF container
|
||||
without adding more overhead than the PDF structure itself and will save all
|
||||
other graphics formats using lossless zip-compression.
|
||||
img2pdf is able to losslessly embed JPEG and JPEG2000 files into a PDF
|
||||
container without additional overhead (aside from the PDF structure itself),
|
||||
save other graphics formats using lossless zip compression,
|
||||
and produce multi-page PDF files when more than one input image is given.
|
||||
|
||||
Another nifty advantage: Since no re-encoding is done in case of JPEG images,
|
||||
the conversion is many (ten to hundred) times faster with img2pdf compared to
|
||||
imagemagick. While a run of above convert command with a 2.8MB JPEG takes 27
|
||||
seconds (on average) on my machine, conversion using img2pdf takes just a
|
||||
Also, since JPEG and JPEG2000 images are not reencoded, conversion with
|
||||
img2pdf is several (ten to hundred) times faster than with imagemagick.
|
||||
While the above convert command with a 2.8MB JPEG took 27 seconds
|
||||
(on average) on my machine, conversion using img2pdf took just a
|
||||
fraction of a second.
|
||||
|
||||
Commandline Arguments
|
||||
|
@ -81,27 +80,26 @@ More help is available with the -h or --help option.
|
|||
Bugs
|
||||
----
|
||||
|
||||
If you find a JPEG or JPEG2000 file that, when embedded can not be read by the
|
||||
Adobe Acrobat Reader, please contact me.
|
||||
If you find a JPEG or JPEG2000 file that, when embedded cannot be read
|
||||
by the Adobe Acrobat Reader, please contact me.
|
||||
|
||||
For lossless conversion of other formats than JPEG or JPEG2000 files, zip/flate
|
||||
encoding is used. This choice is based on a number of tests I did on images.
|
||||
I converted them into PDF using imagemagick and all compressions it has to
|
||||
offer and then compared the output size of the lossless variants. In all my
|
||||
tests, zip/flate encoding performed best. You can verify my findings using the
|
||||
test_comp.sh script with any input image given as a commandline argument. If
|
||||
you find an input file that is outperformed by another lossless compression,
|
||||
contact me.
|
||||
For lossless conversion of formats other than JPEG or JPEG2000, zip/flate
|
||||
encoding is used. This choice is based on tests I did with a number of images.
|
||||
I converted them into PDF using the lossless variants of the compression
|
||||
formats offered by imagemagick. In all my tests, zip/flate encoding performed
|
||||
best. You can verify my findings using the test_comp.sh script with any input
|
||||
image given as a commandline argument. If you find an input file that is
|
||||
outperformed by another lossless compression method, contact me.
|
||||
|
||||
I have not yet figured out how to read the colorspace from jpeg2000 files.
|
||||
Therefor jpeg2000 files use DeviceRGB per default. If your jpeg2000 files are
|
||||
of any other colorspace you must force it using the --colorspace option.
|
||||
Like -C L for DeviceGray.
|
||||
I have not yet figured out how to determine the colorspace of JPEG2000 files.
|
||||
Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 files with
|
||||
other colorspaces, you must force it using the `--colorspace` option.
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
On a Debian/Ubuntu based OS, the following dependencies are needed:
|
||||
On a Debian- and Ubuntu-based systems, dependencies may be installed
|
||||
with the following command:
|
||||
|
||||
apt-get install python python-pil python-setuptools
|
||||
|
||||
|
@ -109,17 +107,17 @@ Or for Python 3:
|
|||
|
||||
apt-get install python3 python3-pil python3-setuptools
|
||||
|
||||
You can install the package using:
|
||||
You can then install the package using:
|
||||
|
||||
$ pip install img2pdf
|
||||
|
||||
If you want to install from source code simply use:
|
||||
If you prefer to install from source code use:
|
||||
|
||||
$ cd img2pdf/
|
||||
$ pip install .
|
||||
|
||||
To test the console script without installing the package on your system,
|
||||
simply use virtualenv:
|
||||
use virtualenv:
|
||||
|
||||
$ cd img2pdf/
|
||||
$ virtualenv ve
|
||||
|
@ -129,7 +127,7 @@ You can then test the converter using:
|
|||
|
||||
$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
|
||||
|
||||
Note that the package can also be used as a library as follows:
|
||||
The package can also be used as a library:
|
||||
|
||||
import img2pdf
|
||||
pdf_bytes = img2pdf.convert(['test.jpg'])
|
||||
|
|
Loading…
Reference in a new issue