general editing

2015-03-20 11:30:19 -07:00 · 2015-03-20 11:30:19 -07:00 · be21c4bbf3
commit be21c4bbf3
parent 592cdc1cdb
1 changed files with 44 additions and 46 deletions
--- a/README.md
+++ b/README.md
@ -1,15 +1,16 @@
 img2pdf
 =======

-Lossless conversion of images to PDF without unnecessarily re-encoding JPEG and
-JPEG2000 files. Thus, no loss of quality and no unnecessary large output file.
+Losslessly convert images to PDF without unnecessarily re-encoding JPEG and
+JPEG2000 files.  Image quality is retained without unnecessarily increasing
+file size.

 Background
 ----------

-PDF is able to embed JPEG and JPEG2000 images as they are without re-encoding
-them (and hence losing quality) but I was missing a tool to do this
-automatically, thus I wrote this piece of python code.
+Quality loss can be avoided when converting JPEG and JPEG2000 images to
+PDF by embedding them without re-encoding.  I wrote this piece of python code.
+because I was missing a tool to do this automatically.

 If you know how to embed JPEG and JPEG2000 images into a PDF container without
 recompression, using existing tools, please contact me so that I can put this
@ -18,43 +19,41 @@ code into the garbage bin :D
 Functionality
 -------------

-The program will take image filenames from commandline arguments and output a
-PDF file with them embedded into it. If the input image is a JPEG or JPEG2000
-file, it will be included as-is without any processing. If it is in any other
-format, the image will be included as zip-encoded RGB. As a result, this tool
-will be able to lossless wrap any image into a PDF container while performing
-better (in terms of quality/filesize ratio) than existing tools in case the
-input image is a JPEG or JPEG2000 file.
+This program will take a list of images and produce a PDF file with the
+images embedded in it.  JPEG and JPEG2000 images will be included without
+recompression.  Images in other formats will be included with zip/flate
+encoding.  As a result, this tool is able to losslessly wrap any image
+into a PDF container with a quality-filesize ratio that is typically better
+than that of existing tools.

-For example, imagemagick will re-encode the input JPEG image and thus change
-its content:
+For example, imagemagick will re-encode the input JPEG image (thus changing
+its content):

 	$ convert img.jpg img.pdf
 	$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
 	$ compare -metric AE img.jpg img.extr-000.ppm null:
 	1.6301e+06

-If one wants to do a lossless conversion from any format to PDF with
-imagemagick then one has to use zip-encoding:
+If one wants to losslessly convert from any format to PDF with
+imagemagick, one has to use zip compression:

 	$ convert input.jpg -compress Zip output.pdf
 	$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
 	$ compare -metric AE img.jpg img.extr-000.ppm null:
 	0

-The downside is, that using imagemagick like this will make the resulting PDF
-files a few times bigger than the input JPEG or JPEG2000 file and can also not
-output a multipage PDF.
+However, this approach will result in PDF files that are a few times larger
+than the input JPEG or JPEG2000 file.

-img2pdf is able to output a PDF with multiple pages if more than one input
-image is given, losslessly embed JPEG and JPEG2000 files into a PDF container
-without adding more overhead than the PDF structure itself and will save all
-other graphics formats using lossless zip-compression.
+img2pdf is able to losslessly embed JPEG and JPEG2000 files into a PDF
+container without additional overhead (aside from the PDF structure itself),
+save other graphics formats using lossless zip compression,
+and produce multi-page PDF files when more than one input image is given.

-Another nifty advantage: Since no re-encoding is done in case of JPEG images,
-the conversion is many (ten to hundred) times faster with img2pdf compared to
-imagemagick. While a run of above convert command with a 2.8MB JPEG takes 27
-seconds (on average) on my machine, conversion using img2pdf takes just a
+Also, since JPEG and JPEG2000 images are not reencoded, conversion  with
+img2pdf is several (ten to hundred) times faster than with imagemagick.
+While the above convert command with a 2.8MB JPEG took 27 seconds
+(on average) on my machine, conversion using img2pdf took just a
 fraction of a second.

 Commandline Arguments
@ -81,27 +80,26 @@ More help is available with the -h or --help option.
 Bugs
 ----

-If you find a JPEG or JPEG2000 file that, when embedded can not be read by the
-Adobe Acrobat Reader, please contact me.
+If you find a JPEG or JPEG2000 file that, when embedded cannot be read
+by the Adobe Acrobat Reader, please contact me.

-For lossless conversion of other formats than JPEG or JPEG2000 files, zip/flate
-encoding is used.  This choice is based on a number of tests I did on images.
-I converted them into PDF using imagemagick and all compressions it has to
-offer and then compared the output size of the lossless variants. In all my
-tests, zip/flate encoding performed best. You can verify my findings using the
-test_comp.sh script with any input image given as a commandline argument. If
-you find an input file that is outperformed by another lossless compression,
-contact me.
+For lossless conversion of formats other than JPEG or JPEG2000, zip/flate
+encoding is used.  This choice is based on tests I did with a number of images.
+I converted them into PDF using the lossless variants of the compression
+formats offered by imagemagick.  In all my tests, zip/flate encoding performed
+best.  You can verify my findings using the test_comp.sh script with any input
+image given as a commandline argument.  If you find an input file that is
+outperformed by another lossless compression method, contact me.

-I have not yet figured out how to read the colorspace from jpeg2000 files.
-Therefor jpeg2000 files use DeviceRGB per default. If your jpeg2000 files are
-of any other colorspace you must force it using the --colorspace option.
-Like -C L for DeviceGray.
+I have not yet figured out how to determine the colorspace of JPEG2000 files.
+Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 files with
+other colorspaces, you must force it using the `--colorspace` option.

 Installation
 ------------

-On a Debian/Ubuntu based OS, the following dependencies are needed:
+On a Debian- and Ubuntu-based systems, dependencies may be installed
+with the following command:

 	apt-get install python python-pil python-setuptools

@ -109,17 +107,17 @@ Or for Python 3:

 	apt-get install python3 python3-pil python3-setuptools

-You can install the package using:
+You can then install the package using:

 	$ pip install img2pdf

-If you want to install from source code simply use:
+If you prefer to install from source code use:

 	$ cd img2pdf/
 	$ pip install .

 To test the console script without installing the package on your system,
-simply use virtualenv:
+use virtualenv:

 	$ cd img2pdf/
 	$ virtualenv ve
@ -129,7 +127,7 @@ You can then test the converter using:

 	$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg

-Note that the package can also be used as a library as follows:
+The package can also be used as a library:

 	import img2pdf
 	pdf_bytes = img2pdf.convert(['test.jpg'])