general editing
This commit is contained in:
parent
592cdc1cdb
commit
be21c4bbf3
1 changed files with 44 additions and 46 deletions
90
README.md
90
README.md
|
@ -1,15 +1,16 @@
|
||||||
img2pdf
|
img2pdf
|
||||||
=======
|
=======
|
||||||
|
|
||||||
Lossless conversion of images to PDF without unnecessarily re-encoding JPEG and
|
Losslessly convert images to PDF without unnecessarily re-encoding JPEG and
|
||||||
JPEG2000 files. Thus, no loss of quality and no unnecessary large output file.
|
JPEG2000 files. Image quality is retained without unnecessarily increasing
|
||||||
|
file size.
|
||||||
|
|
||||||
Background
|
Background
|
||||||
----------
|
----------
|
||||||
|
|
||||||
PDF is able to embed JPEG and JPEG2000 images as they are without re-encoding
|
Quality loss can be avoided when converting JPEG and JPEG2000 images to
|
||||||
them (and hence losing quality) but I was missing a tool to do this
|
PDF by embedding them without re-encoding. I wrote this piece of python code.
|
||||||
automatically, thus I wrote this piece of python code.
|
because I was missing a tool to do this automatically.
|
||||||
|
|
||||||
If you know how to embed JPEG and JPEG2000 images into a PDF container without
|
If you know how to embed JPEG and JPEG2000 images into a PDF container without
|
||||||
recompression, using existing tools, please contact me so that I can put this
|
recompression, using existing tools, please contact me so that I can put this
|
||||||
|
@ -18,43 +19,41 @@ code into the garbage bin :D
|
||||||
Functionality
|
Functionality
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
The program will take image filenames from commandline arguments and output a
|
This program will take a list of images and produce a PDF file with the
|
||||||
PDF file with them embedded into it. If the input image is a JPEG or JPEG2000
|
images embedded in it. JPEG and JPEG2000 images will be included without
|
||||||
file, it will be included as-is without any processing. If it is in any other
|
recompression. Images in other formats will be included with zip/flate
|
||||||
format, the image will be included as zip-encoded RGB. As a result, this tool
|
encoding. As a result, this tool is able to losslessly wrap any image
|
||||||
will be able to lossless wrap any image into a PDF container while performing
|
into a PDF container with a quality-filesize ratio that is typically better
|
||||||
better (in terms of quality/filesize ratio) than existing tools in case the
|
than that of existing tools.
|
||||||
input image is a JPEG or JPEG2000 file.
|
|
||||||
|
|
||||||
For example, imagemagick will re-encode the input JPEG image and thus change
|
For example, imagemagick will re-encode the input JPEG image (thus changing
|
||||||
its content:
|
its content):
|
||||||
|
|
||||||
$ convert img.jpg img.pdf
|
$ convert img.jpg img.pdf
|
||||||
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
|
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
|
||||||
$ compare -metric AE img.jpg img.extr-000.ppm null:
|
$ compare -metric AE img.jpg img.extr-000.ppm null:
|
||||||
1.6301e+06
|
1.6301e+06
|
||||||
|
|
||||||
If one wants to do a lossless conversion from any format to PDF with
|
If one wants to losslessly convert from any format to PDF with
|
||||||
imagemagick then one has to use zip-encoding:
|
imagemagick, one has to use zip compression:
|
||||||
|
|
||||||
$ convert input.jpg -compress Zip output.pdf
|
$ convert input.jpg -compress Zip output.pdf
|
||||||
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
|
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
|
||||||
$ compare -metric AE img.jpg img.extr-000.ppm null:
|
$ compare -metric AE img.jpg img.extr-000.ppm null:
|
||||||
0
|
0
|
||||||
|
|
||||||
The downside is, that using imagemagick like this will make the resulting PDF
|
However, this approach will result in PDF files that are a few times larger
|
||||||
files a few times bigger than the input JPEG or JPEG2000 file and can also not
|
than the input JPEG or JPEG2000 file.
|
||||||
output a multipage PDF.
|
|
||||||
|
|
||||||
img2pdf is able to output a PDF with multiple pages if more than one input
|
img2pdf is able to losslessly embed JPEG and JPEG2000 files into a PDF
|
||||||
image is given, losslessly embed JPEG and JPEG2000 files into a PDF container
|
container without additional overhead (aside from the PDF structure itself),
|
||||||
without adding more overhead than the PDF structure itself and will save all
|
save other graphics formats using lossless zip compression,
|
||||||
other graphics formats using lossless zip-compression.
|
and produce multi-page PDF files when more than one input image is given.
|
||||||
|
|
||||||
Another nifty advantage: Since no re-encoding is done in case of JPEG images,
|
Also, since JPEG and JPEG2000 images are not reencoded, conversion with
|
||||||
the conversion is many (ten to hundred) times faster with img2pdf compared to
|
img2pdf is several (ten to hundred) times faster than with imagemagick.
|
||||||
imagemagick. While a run of above convert command with a 2.8MB JPEG takes 27
|
While the above convert command with a 2.8MB JPEG took 27 seconds
|
||||||
seconds (on average) on my machine, conversion using img2pdf takes just a
|
(on average) on my machine, conversion using img2pdf took just a
|
||||||
fraction of a second.
|
fraction of a second.
|
||||||
|
|
||||||
Commandline Arguments
|
Commandline Arguments
|
||||||
|
@ -81,27 +80,26 @@ More help is available with the -h or --help option.
|
||||||
Bugs
|
Bugs
|
||||||
----
|
----
|
||||||
|
|
||||||
If you find a JPEG or JPEG2000 file that, when embedded can not be read by the
|
If you find a JPEG or JPEG2000 file that, when embedded cannot be read
|
||||||
Adobe Acrobat Reader, please contact me.
|
by the Adobe Acrobat Reader, please contact me.
|
||||||
|
|
||||||
For lossless conversion of other formats than JPEG or JPEG2000 files, zip/flate
|
For lossless conversion of formats other than JPEG or JPEG2000, zip/flate
|
||||||
encoding is used. This choice is based on a number of tests I did on images.
|
encoding is used. This choice is based on tests I did with a number of images.
|
||||||
I converted them into PDF using imagemagick and all compressions it has to
|
I converted them into PDF using the lossless variants of the compression
|
||||||
offer and then compared the output size of the lossless variants. In all my
|
formats offered by imagemagick. In all my tests, zip/flate encoding performed
|
||||||
tests, zip/flate encoding performed best. You can verify my findings using the
|
best. You can verify my findings using the test_comp.sh script with any input
|
||||||
test_comp.sh script with any input image given as a commandline argument. If
|
image given as a commandline argument. If you find an input file that is
|
||||||
you find an input file that is outperformed by another lossless compression,
|
outperformed by another lossless compression method, contact me.
|
||||||
contact me.
|
|
||||||
|
|
||||||
I have not yet figured out how to read the colorspace from jpeg2000 files.
|
I have not yet figured out how to determine the colorspace of JPEG2000 files.
|
||||||
Therefor jpeg2000 files use DeviceRGB per default. If your jpeg2000 files are
|
Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 files with
|
||||||
of any other colorspace you must force it using the --colorspace option.
|
other colorspaces, you must force it using the `--colorspace` option.
|
||||||
Like -C L for DeviceGray.
|
|
||||||
|
|
||||||
Installation
|
Installation
|
||||||
------------
|
------------
|
||||||
|
|
||||||
On a Debian/Ubuntu based OS, the following dependencies are needed:
|
On a Debian- and Ubuntu-based systems, dependencies may be installed
|
||||||
|
with the following command:
|
||||||
|
|
||||||
apt-get install python python-pil python-setuptools
|
apt-get install python python-pil python-setuptools
|
||||||
|
|
||||||
|
@ -109,17 +107,17 @@ Or for Python 3:
|
||||||
|
|
||||||
apt-get install python3 python3-pil python3-setuptools
|
apt-get install python3 python3-pil python3-setuptools
|
||||||
|
|
||||||
You can install the package using:
|
You can then install the package using:
|
||||||
|
|
||||||
$ pip install img2pdf
|
$ pip install img2pdf
|
||||||
|
|
||||||
If you want to install from source code simply use:
|
If you prefer to install from source code use:
|
||||||
|
|
||||||
$ cd img2pdf/
|
$ cd img2pdf/
|
||||||
$ pip install .
|
$ pip install .
|
||||||
|
|
||||||
To test the console script without installing the package on your system,
|
To test the console script without installing the package on your system,
|
||||||
simply use virtualenv:
|
use virtualenv:
|
||||||
|
|
||||||
$ cd img2pdf/
|
$ cd img2pdf/
|
||||||
$ virtualenv ve
|
$ virtualenv ve
|
||||||
|
@ -129,7 +127,7 @@ You can then test the converter using:
|
||||||
|
|
||||||
$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
|
$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
|
||||||
|
|
||||||
Note that the package can also be used as a library as follows:
|
The package can also be used as a library:
|
||||||
|
|
||||||
import img2pdf
|
import img2pdf
|
||||||
pdf_bytes = img2pdf.convert(['test.jpg'])
|
pdf_bytes = img2pdf.convert(['test.jpg'])
|
||||||
|
|
Loading…
Reference in a new issue