diff --git a/README.md b/README.md index d658d5e..ac5b762 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,16 @@ img2pdf ======= -Lossless conversion of images to PDF without unnecessarily re-encoding JPEG and -JPEG2000 files. Thus, no loss of quality and no unnecessary large output file. +Losslessly convert images to PDF without unnecessarily re-encoding JPEG and +JPEG2000 files. Image quality is retained without unnecessarily increasing +file size. Background ---------- -PDF is able to embed JPEG and JPEG2000 images as they are without re-encoding -them (and hence losing quality) but I was missing a tool to do this -automatically, thus I wrote this piece of python code. +Quality loss can be avoided when converting JPEG and JPEG2000 images to +PDF by embedding them without re-encoding. I wrote this piece of python code. +because I was missing a tool to do this automatically. If you know how to embed JPEG and JPEG2000 images into a PDF container without recompression, using existing tools, please contact me so that I can put this @@ -18,43 +19,41 @@ code into the garbage bin :D Functionality ------------- -The program will take image filenames from commandline arguments and output a -PDF file with them embedded into it. If the input image is a JPEG or JPEG2000 -file, it will be included as-is without any processing. If it is in any other -format, the image will be included as zip-encoded RGB. As a result, this tool -will be able to lossless wrap any image into a PDF container while performing -better (in terms of quality/filesize ratio) than existing tools in case the -input image is a JPEG or JPEG2000 file. +This program will take a list of images and produce a PDF file with the +images embedded in it. JPEG and JPEG2000 images will be included without +recompression. Images in other formats will be included with zip/flate +encoding. As a result, this tool is able to losslessly wrap any image +into a PDF container with a quality-filesize ratio that is typically better +than that of existing tools. -For example, imagemagick will re-encode the input JPEG image and thus change -its content: +For example, imagemagick will re-encode the input JPEG image (thus changing +its content): $ convert img.jpg img.pdf $ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression $ compare -metric AE img.jpg img.extr-000.ppm null: 1.6301e+06 -If one wants to do a lossless conversion from any format to PDF with -imagemagick then one has to use zip-encoding: +If one wants to losslessly convert from any format to PDF with +imagemagick, one has to use zip compression: $ convert input.jpg -compress Zip output.pdf $ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression $ compare -metric AE img.jpg img.extr-000.ppm null: 0 -The downside is, that using imagemagick like this will make the resulting PDF -files a few times bigger than the input JPEG or JPEG2000 file and can also not -output a multipage PDF. +However, this approach will result in PDF files that are a few times larger +than the input JPEG or JPEG2000 file. -img2pdf is able to output a PDF with multiple pages if more than one input -image is given, losslessly embed JPEG and JPEG2000 files into a PDF container -without adding more overhead than the PDF structure itself and will save all -other graphics formats using lossless zip-compression. +img2pdf is able to losslessly embed JPEG and JPEG2000 files into a PDF +container without additional overhead (aside from the PDF structure itself), +save other graphics formats using lossless zip compression, +and produce multi-page PDF files when more than one input image is given. -Another nifty advantage: Since no re-encoding is done in case of JPEG images, -the conversion is many (ten to hundred) times faster with img2pdf compared to -imagemagick. While a run of above convert command with a 2.8MB JPEG takes 27 -seconds (on average) on my machine, conversion using img2pdf takes just a +Also, since JPEG and JPEG2000 images are not reencoded, conversion with +img2pdf is several (ten to hundred) times faster than with imagemagick. +While the above convert command with a 2.8MB JPEG took 27 seconds +(on average) on my machine, conversion using img2pdf took just a fraction of a second. Commandline Arguments @@ -81,27 +80,26 @@ More help is available with the -h or --help option. Bugs ---- -If you find a JPEG or JPEG2000 file that, when embedded can not be read by the -Adobe Acrobat Reader, please contact me. +If you find a JPEG or JPEG2000 file that, when embedded cannot be read +by the Adobe Acrobat Reader, please contact me. -For lossless conversion of other formats than JPEG or JPEG2000 files, zip/flate -encoding is used. This choice is based on a number of tests I did on images. -I converted them into PDF using imagemagick and all compressions it has to -offer and then compared the output size of the lossless variants. In all my -tests, zip/flate encoding performed best. You can verify my findings using the -test_comp.sh script with any input image given as a commandline argument. If -you find an input file that is outperformed by another lossless compression, -contact me. +For lossless conversion of formats other than JPEG or JPEG2000, zip/flate +encoding is used. This choice is based on tests I did with a number of images. +I converted them into PDF using the lossless variants of the compression +formats offered by imagemagick. In all my tests, zip/flate encoding performed +best. You can verify my findings using the test_comp.sh script with any input +image given as a commandline argument. If you find an input file that is +outperformed by another lossless compression method, contact me. -I have not yet figured out how to read the colorspace from jpeg2000 files. -Therefor jpeg2000 files use DeviceRGB per default. If your jpeg2000 files are -of any other colorspace you must force it using the --colorspace option. -Like -C L for DeviceGray. +I have not yet figured out how to determine the colorspace of JPEG2000 files. +Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 files with +other colorspaces, you must force it using the `--colorspace` option. Installation ------------ -On a Debian/Ubuntu based OS, the following dependencies are needed: +On a Debian- and Ubuntu-based systems, dependencies may be installed +with the following command: apt-get install python python-pil python-setuptools @@ -109,17 +107,17 @@ Or for Python 3: apt-get install python3 python3-pil python3-setuptools -You can install the package using: +You can then install the package using: $ pip install img2pdf -If you want to install from source code simply use: +If you prefer to install from source code use: $ cd img2pdf/ $ pip install . To test the console script without installing the package on your system, -simply use virtualenv: +use virtualenv: $ cd img2pdf/ $ virtualenv ve @@ -129,7 +127,7 @@ You can then test the converter using: $ ve/bin/img2pdf -o test.pdf src/tests/test.jpg -Note that the package can also be used as a library as follows: +The package can also be used as a library: import img2pdf pdf_bytes = img2pdf.convert(['test.jpg'])