forked from josch/img2pdf
Merge branch 'master' of http://gitlab.mister-muffin.de/josch/img2pdf
# Conflicts: # src/img2pdf.py
This commit is contained in:
commit
1261741136
16 changed files with 711 additions and 217 deletions
38
CHANGES.rst
38
CHANGES.rst
|
@ -2,7 +2,43 @@
|
|||
CHANGES
|
||||
=======
|
||||
|
||||
1.0.0 (unreleased)
|
||||
0.1.6
|
||||
-----
|
||||
|
||||
- replace -x and -y option by combined option -s (or --pagesize) and use -S
|
||||
for --subject
|
||||
- correctly encode and escape non-ascii metadata
|
||||
- explicitly store date in UTC and allow parsing all date formats understood
|
||||
by dateutil and `date --date`
|
||||
|
||||
0.1.5
|
||||
-----
|
||||
|
||||
- Enable support for CMYK images
|
||||
- Rework test suite
|
||||
- support file objects as input
|
||||
|
||||
0.1.4
|
||||
-----
|
||||
|
||||
- add Python 3 support
|
||||
- make output reproducible by sorting and --nodate option
|
||||
|
||||
0.1.3
|
||||
-----
|
||||
|
||||
- Avoid leaking file descriptors
|
||||
- Convert unrecognized colorspaces to RGB
|
||||
|
||||
0.1.1
|
||||
-----
|
||||
|
||||
- allow running src/img2pdf.py standalone
|
||||
- license change from GPL to LGPL
|
||||
- Add pillow 2.4.0 support
|
||||
- add options to specify pdf dimensions in points
|
||||
|
||||
0.1.0 (unreleased)
|
||||
------------------
|
||||
|
||||
- Initial PyPI release.
|
||||
|
|
186
README.md
186
README.md
|
@ -1,15 +1,16 @@
|
|||
img2pdf
|
||||
=======
|
||||
|
||||
Lossless conversion of images to PDF without unnecessarily re-encoding JPEG and
|
||||
JPEG2000 files. Thus, no loss of quality and no unnecessary large output file.
|
||||
Losslessly convert images to PDF without unnecessarily re-encoding JPEG and
|
||||
JPEG2000 files. Image quality is retained without unnecessarily increasing
|
||||
file size.
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
PDF is able to embed JPEG and JPEG2000 images as they are without re-encoding
|
||||
them (and hence loosing quality) but I was missing a tool to do this
|
||||
automatically, thus I wrote this piece of python code.
|
||||
Quality loss can be avoided when converting JPEG and JPEG2000 images to
|
||||
PDF by embedding them without re-encoding. I wrote this piece of python code.
|
||||
because I was missing a tool to do this automatically.
|
||||
|
||||
If you know how to embed JPEG and JPEG2000 images into a PDF container without
|
||||
recompression, using existing tools, please contact me so that I can put this
|
||||
|
@ -18,100 +19,160 @@ code into the garbage bin :D
|
|||
Functionality
|
||||
-------------
|
||||
|
||||
The program will take image filenames from commandline arguments and output a
|
||||
PDF file with them embedded into it. If the input image is a JPEG or JPEG2000
|
||||
file, it will be included as-is without any processing. If it is in any other
|
||||
format, the image will be included as zip-encoded RGB. As a result, this tool
|
||||
will be able to lossless wrap any image into a PDF container while performing
|
||||
better (in terms of quality/filesize ratio) than existing tools in case the
|
||||
input image is a JPEG or JPEG2000 file.
|
||||
This program will take a list of images and produce a PDF file with the images
|
||||
embedded in it. JPEG and JPEG2000 images will be included without
|
||||
recompression. Images in other formats will be included with zip/flate
|
||||
encoding which usually leads to an increase in the resulting size because
|
||||
formats like png compress better than PDF which just zip/flate compresses the
|
||||
RGB data. As a result, this tool is able to losslessly wrap images into a PDF
|
||||
container with a quality-filesize ratio that is typically better (in case of
|
||||
JPEG and JPEG2000 images) or equal (in case of other formats) than that of
|
||||
existing tools.
|
||||
|
||||
For example, imagemagick will re-encode the input JPEG image and thus change
|
||||
its content:
|
||||
For example, imagemagick will re-encode the input JPEG image (thus changing
|
||||
its content):
|
||||
|
||||
$ convert img.jpg img.pdf
|
||||
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
|
||||
$ compare -metric AE img.jpg img.extr-000.ppm null:
|
||||
1.6301e+06
|
||||
|
||||
If one wants to do a lossless conversion from any format to PDF with
|
||||
imagemagick then one has to use zip-encoding:
|
||||
If one wants to losslessly convert from any format to PDF with
|
||||
imagemagick, one has to use zip compression:
|
||||
|
||||
$ convert input.jpg -compress Zip output.pdf
|
||||
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
|
||||
$ compare -metric AE img.jpg img.extr-000.ppm null:
|
||||
0
|
||||
|
||||
The downside is, that using imagemagick like this will make the resulting PDF
|
||||
files a few times bigger than the input JPEG or JPEG2000 file and can also not
|
||||
output a multipage PDF.
|
||||
However, this approach will result in PDF files that are a few times larger
|
||||
than the input JPEG or JPEG2000 file.
|
||||
|
||||
img2pdf is able to output a PDF with multiple pages if more than one input
|
||||
image is given, losslessly embed JPEG and JPEG2000 files into a PDF container
|
||||
without adding more overhead than the PDF structure itself and will save all
|
||||
other graphics formats using lossless zip-compression.
|
||||
img2pdf is able to losslessly embed JPEG and JPEG2000 files into a PDF
|
||||
container without additional overhead (aside from the PDF structure itself),
|
||||
save other graphics formats using lossless zip compression,
|
||||
and produce multi-page PDF files when more than one input image is given.
|
||||
|
||||
Another nifty advantage: Since no re-encoding is done in case of JPEG images,
|
||||
the conversion is many (ten to hundred) times faster with img2pdf compared to
|
||||
imagemagick. While a run of above convert command with a 2.8MB JPEG takes 27
|
||||
seconds (on average) on my machine, conversion using img2pdf takes just a
|
||||
fraction of a second.
|
||||
Also, since JPEG and JPEG2000 images are not reencoded, conversion with
|
||||
img2pdf is several times faster than with other tools.
|
||||
|
||||
Commandline Arguments
|
||||
---------------------
|
||||
|
||||
At least one input file argument must be given as img2pdf needs to seek in the
|
||||
file descriptor which would not be possible for stdin.
|
||||
Usage
|
||||
-----
|
||||
|
||||
Specify the dpi with the -d or --dpi options instead of reading it from the
|
||||
image or falling back to 96.0.
|
||||
#### General Notes
|
||||
|
||||
Specify the output file with -o or --output. By default output will be done to
|
||||
stdout.
|
||||
The images must be provided as files because img2pdf needs to seek
|
||||
in the file descriptor. Input cannot be piped through stdin.
|
||||
|
||||
Specify metadata using the --title, --author, --creator, --producer,
|
||||
--creationdate, --moddate, --subject and --keywords options (or their short
|
||||
forms).
|
||||
If no output file is specified with the `-o`/`--output` option,
|
||||
output will be to stdout.
|
||||
|
||||
Specify -C or --colorspace to force a colorspace using PIL short handles like
|
||||
'RGB', 'L' or '1'.
|
||||
Descriptions of the options should be self explanatory.
|
||||
They are available by running:
|
||||
|
||||
More help is available with the -h or --help option.
|
||||
img2pdf --help
|
||||
|
||||
|
||||
#### Controlling Page Size
|
||||
|
||||
The PDF page size can be manipulated. By default, the image will be sized "into" the given dimensions with the aspect ratio retained. For instance, to size an image into a page that is at most 500pt x 500pt, use:
|
||||
|
||||
img2pdf -s 500x500 -o output.pdf input.jpg
|
||||
|
||||
To "fill" out a page that is at least 500pt x 500pt, follow the dimensions with a `^`:
|
||||
|
||||
img2pdf -s 500x500^ -o output.pdf input.jpg
|
||||
|
||||
To output pages that are exactly 500pt x 500pt, follow the dimensions with an `!`:
|
||||
|
||||
img2pdf -s 500x500\! -o output.pdf input.jpg
|
||||
|
||||
Notice that the default unit is points. Units may be also be specified and mixed:
|
||||
|
||||
img2pdf -s 8.5inx27.94cm -o output.pdf input.jpg
|
||||
|
||||
If either width or height is omitted, the other will be calculated
|
||||
to preserve aspect ratio.
|
||||
|
||||
img2pdf -s x280mm -o output1.pdf input.jpg
|
||||
img2pdf -s 280mmx -o output2.pdf input.jpg
|
||||
|
||||
Some standard page sizes are recognized:
|
||||
|
||||
img2pdf -s letter -o output1.pdf input.jpg
|
||||
img2pdf -s a4 -o output2.pdf input.jpg
|
||||
|
||||
#### Colorspace
|
||||
|
||||
Currently, the colorspace must be forced for JPEG 2000 images that are
|
||||
not in the RGB colorspace. Available colorspace options are based on
|
||||
Python Imaging Library (PIL) short handles.
|
||||
|
||||
* `RGB` = RGB color
|
||||
* `L` = Grayscale
|
||||
* `1` = Black and white (internally converted to grayscale)
|
||||
* `CMYK` = CMYK color
|
||||
* `CMYK;I` = CMYK color with inversion
|
||||
|
||||
For example, to encode a grayscale JPEG2000 image, use:
|
||||
|
||||
img2pdf -C L -o output.pdf input.jp2
|
||||
|
||||
Bugs
|
||||
----
|
||||
|
||||
If you find a JPEG or JPEG2000 file that, when embedded can not be read by the
|
||||
Adobe Acrobat Reader, please contact me.
|
||||
If you find a JPEG or JPEG2000 file that, when embedded cannot be read
|
||||
by the Adobe Acrobat Reader, please contact me.
|
||||
|
||||
For lossless conversion of other formats than JPEG or JPEG2000 files, zip/flate
|
||||
encoding is used. This choice is based on a number of tests I did on images.
|
||||
I converted them into PDF using imagemagick and all compressions it has to
|
||||
offer and then compared the output size of the lossless variants. In all my
|
||||
tests, zip/flate encoding performed best. You can verify my findings using the
|
||||
test_comp.sh script with any input image given as a commandline argument. If
|
||||
you find an input file that is outperformed by another lossless compression,
|
||||
contact me.
|
||||
For lossless conversion of formats other than JPEG or JPEG2000, zip/flate
|
||||
encoding is used. This choice is based on tests I did with a number of images.
|
||||
I converted them into PDF using the lossless variants of the compression
|
||||
formats offered by imagemagick. In all my tests, zip/flate encoding performed
|
||||
best. You can verify my findings using the test_comp.sh script with any input
|
||||
image given as a commandline argument. If you find an input file that is
|
||||
outperformed by another lossless compression method, contact me.
|
||||
|
||||
I have not yet figured out how to read the colorspace from jpeg2000 files.
|
||||
Therefor jpeg2000 files use DeviceRGB per default. If your jpeg2000 files are
|
||||
of any other colorspace you must force it using the --colorspace option.
|
||||
Like -C L for DeviceGray.
|
||||
I have not yet figured out how to determine the colorspace of JPEG2000 files.
|
||||
Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 files with
|
||||
other colorspaces, you must force it using the `--colorspace` option.
|
||||
|
||||
It might be possible to store transparency using masks but it is not clear
|
||||
what the utility of such a functionality would be.
|
||||
|
||||
Most vector graphic formats can be losslessly turned into PDF (minus some of
|
||||
the features unsupported by PDF) but img2pdf will currently turn vector
|
||||
graphics into their lossy raster representations.
|
||||
|
||||
Acrobat is able to store a hint for the PDF reader of how to present the PDF
|
||||
when opening it. Things like automatic fullscreen or the zoom level can be
|
||||
configured.
|
||||
|
||||
It would be nice if a single input image could be read from standard input.
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
You can install the package using:
|
||||
On a Debian- and Ubuntu-based systems, dependencies may be installed
|
||||
with the following command:
|
||||
|
||||
apt-get install python python-pil python-setuptools
|
||||
|
||||
Or for Python 3:
|
||||
|
||||
apt-get install python3 python3-pil python3-setuptools
|
||||
|
||||
You can then install the package using:
|
||||
|
||||
$ pip install img2pdf
|
||||
|
||||
If you want to install from source code simply use:
|
||||
If you prefer to install from source code use:
|
||||
|
||||
$ cd img2pdf/
|
||||
$ pip install .
|
||||
|
||||
To test the console script without installing the package on your system,
|
||||
simply use virtualenv:
|
||||
use virtualenv:
|
||||
|
||||
$ cd img2pdf/
|
||||
$ virtualenv ve
|
||||
|
@ -121,7 +182,10 @@ You can then test the converter using:
|
|||
|
||||
$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
|
||||
|
||||
Note that the package can also be used as a library as follows:
|
||||
The package can also be used as a library:
|
||||
|
||||
import img2pdf
|
||||
pdf_bytes = img2pdf('test.jpg', dpi=150)
|
||||
pdf_bytes = img2pdf.convert(['test.jpg'])
|
||||
|
||||
file = open("name.pdf","wb")
|
||||
file.write(pdf_bytes)
|
||||
|
|
2
setup.cfg
Normal file
2
setup.cfg
Normal file
|
@ -0,0 +1,2 @@
|
|||
[metadata]
|
||||
description-file = README.md
|
10
setup.py
10
setup.py
|
@ -1,9 +1,12 @@
|
|||
from setuptools import setup
|
||||
|
||||
VERSION="0.1.6~git"
|
||||
|
||||
setup (
|
||||
name='img2pdf',
|
||||
version='0.1.0',
|
||||
version=VERSION,
|
||||
author = "Johannes 'josch' Schauer",
|
||||
author_email = 'j.schauer@email.de',
|
||||
description = "Convert images to PDF via direct JPEG inclusion.",
|
||||
long_description = open('README.md').read(),
|
||||
license = "LGPL",
|
||||
|
@ -15,12 +18,15 @@ setup (
|
|||
'Programming Language :: Python :: 2',
|
||||
'Programming Language :: Python :: 2.6',
|
||||
'Programming Language :: Python :: 2.7',
|
||||
'Programming Language :: Python :: 3',
|
||||
'Programming Language :: Python :: 3.4',
|
||||
'Programming Language :: Python :: Implementation :: CPython',
|
||||
'License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)',
|
||||
'Programming Language :: Python',
|
||||
'Natural Language :: English',
|
||||
'Operating System :: OS Independent'],
|
||||
url = 'http://pypi.python.org/pypi/img2pdf',
|
||||
url = 'https://github.com/josch/img2pdf',
|
||||
download_url = 'https://github.com/josch/img2pdf/archive/'+VERSION+'.tar.gz',
|
||||
package_dir={"": "src"},
|
||||
py_modules=['img2pdf', 'jp2'],
|
||||
include_package_data = True,
|
||||
|
|
540
src/img2pdf.py
540
src/img2pdf.py
|
@ -1,3 +1,5 @@
|
|||
#!/usr/bin/env python2
|
||||
|
||||
# Copyright (C) 2012-2014 Johannes 'josch' Schauer <j.schauer at email.de>
|
||||
#
|
||||
# This program is free software: you can redistribute it and/or
|
||||
|
@ -15,13 +17,20 @@
|
|||
# License along with this program. If not, see
|
||||
# <http://www.gnu.org/licenses/>.
|
||||
|
||||
__version__ = "0.1.6~git"
|
||||
default_dpi = 96.0
|
||||
|
||||
import re
|
||||
import sys
|
||||
import zlib
|
||||
import argparse
|
||||
import struct
|
||||
from PIL import Image
|
||||
from datetime import datetime
|
||||
from jp2 import parsejp2
|
||||
try:
|
||||
from cStringIO import cStringIO
|
||||
except ImportError:
|
||||
from io import BytesIO as cStringIO
|
||||
|
||||
# XXX: Switch to use logging module.
|
||||
def debug_out(message, verbose=True):
|
||||
|
@ -34,19 +43,28 @@ def error_out(message):
|
|||
def warning_out(message):
|
||||
sys.stderr.write("W: "+message+"\n")
|
||||
|
||||
def datetime_to_pdfdate(dt):
|
||||
return dt.strftime("%Y%m%d%H%M%SZ")
|
||||
|
||||
def parse(cont, indent=1):
|
||||
if type(cont) is dict:
|
||||
return "<<\n"+"\n".join(
|
||||
[4 * indent * " " + "%s %s" % (k, parse(v, indent+1))
|
||||
for k, v in cont.items()])+"\n"+4*(indent-1)*" "+">>"
|
||||
elif type(cont) is int or type(cont) is float:
|
||||
return str(cont)
|
||||
return b"<<\n"+b"\n".join(
|
||||
[4 * indent * b" " + k + b" " + parse(v, indent+1)
|
||||
for k, v in sorted(cont.items())])+b"\n"+4*(indent-1)*b" "+b">>"
|
||||
elif type(cont) is int:
|
||||
return str(cont).encode()
|
||||
elif type(cont) is float:
|
||||
return ("%0.4f"%cont).encode()
|
||||
elif isinstance(cont, obj):
|
||||
return "%d 0 R"%cont.identifier
|
||||
elif type(cont) is str:
|
||||
return ("%d 0 R"%cont.identifier).encode()
|
||||
elif type(cont) is str or type(cont) is bytes:
|
||||
if type(cont) is str and type(cont) is not bytes:
|
||||
raise Exception("parse must be passed a bytes object in py3")
|
||||
return cont
|
||||
elif type(cont) is list:
|
||||
return "[ "+" ".join([parse(c, indent) for c in cont])+" ]"
|
||||
return b"[ "+b" ".join([parse(c, indent) for c in cont])+b" ]"
|
||||
else:
|
||||
raise Exception("cannot handle type %s"%type(cont))
|
||||
|
||||
class obj(object):
|
||||
def __init__(self, content, stream=None):
|
||||
|
@ -56,56 +74,56 @@ class obj(object):
|
|||
def tostring(self):
|
||||
if self.stream:
|
||||
return (
|
||||
"%d 0 obj " % self.identifier +
|
||||
("%d 0 obj " % self.identifier).encode() +
|
||||
parse(self.content) +
|
||||
"\nstream\n" + self.stream + "\nendstream\nendobj\n")
|
||||
b"\nstream\n" + self.stream + b"\nendstream\nendobj\n")
|
||||
else:
|
||||
return "%d 0 obj "%self.identifier+parse(self.content)+" endobj\n"
|
||||
return ("%d 0 obj "%self.identifier).encode()+parse(self.content)+b" endobj\n"
|
||||
|
||||
class pdfdoc(object):
|
||||
|
||||
def __init__(self, version=3, title=None, author=None, creator=None,
|
||||
producer=None, creationdate=None, moddate=None, subject=None,
|
||||
keywords=None):
|
||||
keywords=None, nodate=False):
|
||||
self.version = version # default pdf version 1.3
|
||||
now = datetime.now()
|
||||
self.objects = []
|
||||
|
||||
info = {}
|
||||
if title:
|
||||
info["/Title"] = "("+title+")"
|
||||
info[b"/Title"] = b"("+title+b")"
|
||||
if author:
|
||||
info["/Author"] = "("+author+")"
|
||||
info[b"/Author"] = b"("+author+b")"
|
||||
if creator:
|
||||
info["/Creator"] = "("+creator+")"
|
||||
info[b"/Creator"] = b"("+creator+b")"
|
||||
if producer:
|
||||
info["/Producer"] = "("+producer+")"
|
||||
info[b"/Producer"] = b"("+producer+b")"
|
||||
if creationdate:
|
||||
info["/CreationDate"] = "(D:"+creationdate.strftime("%Y%m%d%H%M%S")+")"
|
||||
else:
|
||||
info["/CreationDate"] = "(D:"+now.strftime("%Y%m%d%H%M%S")+")"
|
||||
info[b"/CreationDate"] = b"(D:"+datetime_to_pdfdate(creationdate).encode()+b")"
|
||||
elif not nodate:
|
||||
info[b"/CreationDate"] = b"(D:"+datetime_to_pdfdate(now).encode()+b")"
|
||||
if moddate:
|
||||
info["/ModDate"] = "(D:"+moddate.strftime("%Y%m%d%H%M%S")+")"
|
||||
else:
|
||||
info["/ModDate"] = "(D:"+now.strftime("%Y%m%d%H%M%S")+")"
|
||||
info[b"/ModDate"] = b"(D:"+datetime_to_pdfdate(moddate).encode()+b")"
|
||||
elif not nodate:
|
||||
info[b"/ModDate"] = b"(D:"+datetime_to_pdfdate(now).encode()+b")"
|
||||
if subject:
|
||||
info["/Subject"] = "("+subject+")"
|
||||
info[b"/Subject"] = b"("+subject+b")"
|
||||
if keywords:
|
||||
info["/Keywords"] = "("+",".join(keywords)+")"
|
||||
info[b"/Keywords"] = b"("+b",".join(keywords)+b")"
|
||||
|
||||
self.info = obj(info)
|
||||
|
||||
# create an incomplete pages object so that a /Parent entry can be
|
||||
# added to each page
|
||||
self.pages = obj({
|
||||
"/Type": "/Pages",
|
||||
"/Kids": [],
|
||||
"/Count": 0
|
||||
b"/Type": b"/Pages",
|
||||
b"/Kids": [],
|
||||
b"/Count": 0
|
||||
})
|
||||
|
||||
self.catalog = obj({
|
||||
"/Pages": self.pages,
|
||||
"/Type": "/Catalog"
|
||||
b"/Pages": self.pages,
|
||||
b"/Type": b"/Catalog"
|
||||
})
|
||||
self.addobj(self.catalog)
|
||||
self.addobj(self.pages)
|
||||
|
@ -115,71 +133,70 @@ class pdfdoc(object):
|
|||
obj.identifier = newid
|
||||
self.objects.append(obj)
|
||||
|
||||
def addimage(self, color, width, height, dpi, imgformat, imgdata):
|
||||
def addimage(self, color, width, height, imgformat, imgdata, pdf_x, pdf_y):
|
||||
if color == 'L':
|
||||
color = "/DeviceGray"
|
||||
colorspace = b"/DeviceGray"
|
||||
elif color == 'RGB':
|
||||
color = "/DeviceRGB"
|
||||
colorspace = b"/DeviceRGB"
|
||||
elif color == 'CMYK' or color == 'CMYK;I':
|
||||
colorspace = b"/DeviceCMYK"
|
||||
else:
|
||||
error_out("unsupported color space: %s"%color)
|
||||
exit(1)
|
||||
|
||||
# pdf units = 1/72 inch
|
||||
pdf_x, pdf_y = 72.0*width/dpi[0], 72.0*height/dpi[1]
|
||||
|
||||
print(pdf_x)
|
||||
print(pdf_y)
|
||||
|
||||
if pdf_x < 3.00 or pdf_y < 3.00:
|
||||
warning_out("pdf width or height is below 3.00 - decrease the dpi")
|
||||
elif pdf_x > 14400.0 or pdf_y > 14400.0:
|
||||
#error_out(("pdf width or height is above 200.00 - increase the dpi")
|
||||
warning_out("pdf width or height is below 3.00\" - decrease the dpi")
|
||||
elif pdf_x > 200.0 or pdf_y > 200.0:
|
||||
warning_out("pdf width or height would be above 200\" - squeezed inside")
|
||||
x_scale = 14400.0 / pdf_x
|
||||
y_scale = 14400.0 / pdf_y
|
||||
x_scale = 200.0 / pdf_x
|
||||
y_scale = 200.0 / pdf_y
|
||||
scale = min(x_scale, y_scale) * 0.999
|
||||
pdf_x *= scale
|
||||
pdf_y *= scale
|
||||
|
||||
# either embed the whole jpeg or deflate the bitmap representation
|
||||
if imgformat is "JPEG":
|
||||
ofilter = [ "/DCTDecode" ]
|
||||
elif imgformat is "JP2":
|
||||
ofilter = [ "/JPXDecode" ]
|
||||
ofilter = [ b"/DCTDecode" ]
|
||||
elif imgformat is "JPEG2000":
|
||||
ofilter = [ b"/JPXDecode" ]
|
||||
self.version = 5 # jpeg2000 needs pdf 1.5
|
||||
else:
|
||||
ofilter = [ "/FlateDecode" ]
|
||||
ofilter = [ b"/FlateDecode" ]
|
||||
image = obj({
|
||||
"/Type": "/XObject",
|
||||
"/Subtype": "/Image",
|
||||
"/Filter": ofilter,
|
||||
"/Width": width,
|
||||
"/Height": height,
|
||||
"/ColorSpace": color,
|
||||
# hardcoded as PIL doesnt provide bits for non-jpeg formats
|
||||
"/BitsPerComponent": 8,
|
||||
"/Length": len(imgdata)
|
||||
b"/Type": b"/XObject",
|
||||
b"/Subtype": b"/Image",
|
||||
b"/Filter": ofilter,
|
||||
b"/Width": width,
|
||||
b"/Height": height,
|
||||
b"/ColorSpace": colorspace,
|
||||
# hardcoded as PIL doesn't provide bits for non-jpeg formats
|
||||
b"/BitsPerComponent": 8,
|
||||
b"/Length": len(imgdata)
|
||||
}, imgdata)
|
||||
|
||||
text = "q\n%f 0 0 %f 0 0 cm\n/Im0 Do\nQ"%(pdf_x, pdf_y)
|
||||
if color == 'CMYK;I':
|
||||
# Inverts all four channels
|
||||
image.content[b'/Decode'] = [1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0]
|
||||
|
||||
text = ("q\n%0.4f 0 0 %0.4f 0 0 cm\n/Im0 Do\nQ"%(pdf_x, pdf_y)).encode()
|
||||
|
||||
content = obj({
|
||||
"/Length": len(text)
|
||||
b"/Length": len(text)
|
||||
}, text)
|
||||
|
||||
page = obj({
|
||||
"/Type": "/Page",
|
||||
"/Parent": self.pages,
|
||||
"/Resources": {
|
||||
"/XObject": {
|
||||
"/Im0": image
|
||||
b"/Type": b"/Page",
|
||||
b"/Parent": self.pages,
|
||||
b"/Resources": {
|
||||
b"/XObject": {
|
||||
b"/Im0": image
|
||||
}
|
||||
},
|
||||
"/MediaBox": [0, 0, pdf_x, pdf_y],
|
||||
"/Contents": content
|
||||
b"/MediaBox": [0, 0, pdf_x, pdf_y],
|
||||
b"/Contents": content
|
||||
})
|
||||
self.pages.content["/Kids"].append(page)
|
||||
self.pages.content["/Count"] += 1
|
||||
self.pages.content[b"/Kids"].append(page)
|
||||
self.pages.content[b"/Count"] += 1
|
||||
self.addobj(page)
|
||||
self.addobj(content)
|
||||
self.addobj(image)
|
||||
|
@ -190,35 +207,43 @@ class pdfdoc(object):
|
|||
|
||||
xreftable = list()
|
||||
|
||||
result = "%%PDF-1.%d\n"%self.version
|
||||
result = ("%%PDF-1.%d\n"%self.version).encode()
|
||||
|
||||
xreftable.append("0000000000 65535 f \n")
|
||||
xreftable.append(b"0000000000 65535 f \n")
|
||||
for o in self.objects:
|
||||
xreftable.append("%010d 00000 n \n"%len(result))
|
||||
xreftable.append(("%010d 00000 n \n"%len(result)).encode())
|
||||
result += o.tostring()
|
||||
|
||||
xrefoffset = len(result)
|
||||
result += "xref\n"
|
||||
result += "0 %d\n"%len(xreftable)
|
||||
result += b"xref\n"
|
||||
result += ("0 %d\n"%len(xreftable)).encode()
|
||||
for x in xreftable:
|
||||
result += x
|
||||
result += "trailer\n"
|
||||
result += parse({"/Size": len(xreftable), "/Info": self.info, "/Root": self.catalog})+"\n"
|
||||
result += "startxref\n"
|
||||
result += "%d\n"%xrefoffset
|
||||
result += "%%EOF\n"
|
||||
result += b"trailer\n"
|
||||
result += parse({b"/Size": len(xreftable), b"/Info": self.info, b"/Root": self.catalog})+b"\n"
|
||||
result += b"startxref\n"
|
||||
result += ("%d\n"%xrefoffset).encode()
|
||||
result += b"%%EOF\n"
|
||||
return result
|
||||
|
||||
def convert(images, dpi, title=None, author=None, creator=None, producer=None,
|
||||
creationdate=None, moddate=None, subject=None, keywords=None,
|
||||
colorspace=None, verbose=False):
|
||||
def convert(images, dpi=None, pagesize=(None, None, None), title=None,
|
||||
author=None, creator=None, producer=None, creationdate=None,
|
||||
moddate=None, subject=None, keywords=None, colorspace=None,
|
||||
nodate=False, verbose=False):
|
||||
|
||||
pagesize_options = pagesize[2]
|
||||
|
||||
pdf = pdfdoc(3, title, author, creator, producer, creationdate,
|
||||
moddate, subject, keywords)
|
||||
moddate, subject, keywords, nodate)
|
||||
|
||||
for im in images:
|
||||
for imfilename in images:
|
||||
debug_out("Reading %s"%imfilename, verbose)
|
||||
try:
|
||||
rawdata = imfilename.read()
|
||||
except AttributeError:
|
||||
with open(imfilename, "rb") as im:
|
||||
rawdata = im.read()
|
||||
im.seek(0)
|
||||
im = cStringIO(rawdata)
|
||||
try:
|
||||
imgdata = Image.open(im)
|
||||
except IOError as e:
|
||||
|
@ -229,14 +254,11 @@ def convert(images, dpi, title=None, author=None, creator=None, producer=None,
|
|||
exit(1)
|
||||
# image is jpeg2000
|
||||
width, height, ics = parsejp2(rawdata)
|
||||
imgformat = "JP2"
|
||||
imgformat = "JPEG2000"
|
||||
|
||||
if dpi:
|
||||
ndpi = dpi, dpi
|
||||
debug_out("input dpi (forced) = %d x %d"%ndpi, verbose)
|
||||
else:
|
||||
ndpi = (96, 96) # TODO: read real dpi
|
||||
debug_out("input dpi = %d x %d"%ndpi, verbose)
|
||||
# TODO: read real dpi from input jpeg2000 image
|
||||
ndpi = (default_dpi, default_dpi)
|
||||
debug_out("input dpi = %d x %d" % ndpi, verbose)
|
||||
|
||||
if colorspace:
|
||||
color = colorspace
|
||||
|
@ -248,26 +270,45 @@ def convert(images, dpi, title=None, author=None, creator=None, producer=None,
|
|||
width, height = imgdata.size
|
||||
imgformat = imgdata.format
|
||||
|
||||
if dpi:
|
||||
ndpi = dpi, dpi
|
||||
debug_out("input dpi (forced) = %d x %d"%ndpi, verbose)
|
||||
else:
|
||||
ndpi = imgdata.info.get("dpi", (96, 96))
|
||||
debug_out("input dpi = %d x %d"%ndpi, verbose)
|
||||
ndpi = imgdata.info.get("dpi", (default_dpi, default_dpi))
|
||||
# in python3, the returned dpi value for some tiff images will
|
||||
# not be an integer but a float. To make the behaviour of
|
||||
# img2pdf the same between python2 and python3, we convert that
|
||||
# float into an integer by rounding
|
||||
# search online for the 72.009 dpi problem for more info
|
||||
ndpi = (int(round(ndpi[0])),int(round(ndpi[1])))
|
||||
debug_out("input dpi = %d x %d" % ndpi, verbose)
|
||||
|
||||
if colorspace:
|
||||
color = colorspace
|
||||
debug_out("input colorspace (forced) = %s"%(color), verbose)
|
||||
else:
|
||||
color = imgdata.mode
|
||||
if color == "CMYK" and imgformat == "JPEG":
|
||||
# Adobe inverts CMYK JPEGs for some reason, and others
|
||||
# have followed suit as well. Some software assumes the
|
||||
# JPEG is inverted if the Adobe tag (APP14), while other
|
||||
# software assumes all CMYK JPEGs are inverted. I don't
|
||||
# have enough experience with these to know which is
|
||||
# better for images currently in the wild, so I'm going
|
||||
# with the first approach for now.
|
||||
if "adobe" in imgdata.info:
|
||||
color = "CMYK;I"
|
||||
debug_out("input colorspace = %s"%(color), verbose)
|
||||
|
||||
debug_out("width x height = %d x %d"%(width,height), verbose)
|
||||
debug_out("imgformat = %s"%imgformat, verbose)
|
||||
|
||||
if dpi:
|
||||
ndpi = dpi, dpi
|
||||
debug_out("input dpi (forced) = %d x %d" % ndpi, verbose)
|
||||
elif pagesize_options:
|
||||
ndpi = get_ndpi(width, height, pagesize)
|
||||
debug_out("calculated dpi (based on pagesize) = %d x %d" % ndpi, verbose)
|
||||
|
||||
# depending on the input format, determine whether to pass the raw
|
||||
# image or the zlib compressed color information
|
||||
if imgformat is "JPEG" or imgformat is "JP2":
|
||||
if imgformat is "JPEG" or imgformat is "JPEG2000":
|
||||
if color == '1':
|
||||
error_out("jpeg can't be monochrome")
|
||||
exit(1)
|
||||
|
@ -275,16 +316,61 @@ def convert(images, dpi, title=None, author=None, creator=None, producer=None,
|
|||
else:
|
||||
# because we do not support /CCITTFaxDecode
|
||||
if color == '1':
|
||||
debug_out("Converting colorspace 1 to L", verbose)
|
||||
imgdata = imgdata.convert('L')
|
||||
color = 'L'
|
||||
imgdata = zlib.compress(imgdata.tostring())
|
||||
|
||||
pdf.addimage(color, width, height, ndpi, imgformat, imgdata)
|
||||
|
||||
elif color in ("RGB", "L", "CMYK", "CMYK;I"):
|
||||
debug_out("Colorspace is OK: %s"%color, verbose)
|
||||
else:
|
||||
debug_out("Converting colorspace %s to RGB"%color, verbose)
|
||||
imgdata = imgdata.convert('RGB')
|
||||
color = imgdata.mode
|
||||
img = imgdata.tobytes()
|
||||
# the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the close() method
|
||||
try:
|
||||
imgdata.close()
|
||||
except AttributeError:
|
||||
pass
|
||||
imgdata = zlib.compress(img)
|
||||
im.close()
|
||||
|
||||
if pagesize_options and pagesize_options['exact'][1]:
|
||||
# output size exactly to specified dimensions
|
||||
# pagesize[0], pagesize[1] already checked in valid_size()
|
||||
pdf_x, pdf_y = pagesize[0], pagesize[1]
|
||||
else:
|
||||
# output size based on dpi; point = 1/72 inch
|
||||
pdf_x, pdf_y = 72.0*width/float(ndpi[0]), 72.0*height/float(ndpi[1])
|
||||
|
||||
pdf.addimage(color, width, height, imgformat, imgdata, pdf_x, pdf_y)
|
||||
|
||||
return pdf.tostring()
|
||||
|
||||
def get_ndpi(width, height, pagesize):
|
||||
pagesize_options = pagesize[2]
|
||||
|
||||
if pagesize_options and pagesize_options['fill'][1]:
|
||||
if width/height < pagesize[0]/pagesize[1]:
|
||||
tmp_dpi = 72.0*width/pagesize[0]
|
||||
else:
|
||||
tmp_dpi = 72.0*height/pagesize[1]
|
||||
elif pagesize[0] and pagesize[1]:
|
||||
# if both height and width given with no specific pagesize_option,
|
||||
# resize to fit "into" page
|
||||
if width/height < pagesize[0]/pagesize[1]:
|
||||
tmp_dpi = 72.0*height/pagesize[1]
|
||||
else:
|
||||
tmp_dpi = 72.0*width/pagesize[0]
|
||||
elif pagesize[0]:
|
||||
# if width given, calculate dpi based on width
|
||||
tmp_dpi = 72.0*width/pagesize[0]
|
||||
elif pagesize[1]:
|
||||
# if height given, calculate dpi based on height
|
||||
tmp_dpi = 72.0*height/pagesize[1]
|
||||
else:
|
||||
tmp_dpi = default_dpi
|
||||
|
||||
return tmp_dpi, tmp_dpi
|
||||
|
||||
def positive_float(string):
|
||||
value = float(string)
|
||||
|
@ -294,58 +380,276 @@ def positive_float(string):
|
|||
return value
|
||||
|
||||
def valid_date(string):
|
||||
# first try parsing in ISO8601 format
|
||||
try:
|
||||
return datetime.strptime(string, "%Y-%m-%d")
|
||||
except ValueError:
|
||||
pass
|
||||
try:
|
||||
return datetime.strptime(string, "%Y-%m-%dT%H:%M")
|
||||
except ValueError:
|
||||
pass
|
||||
try:
|
||||
return datetime.strptime(string, "%Y-%m-%dT%H:%M:%S")
|
||||
except ValueError:
|
||||
pass
|
||||
# then try dateutil
|
||||
try:
|
||||
from dateutil import parser
|
||||
except ImportError:
|
||||
pass
|
||||
else:
|
||||
try:
|
||||
return parser.parse(string)
|
||||
except TypeError:
|
||||
pass
|
||||
# as a last resort, try the local date utility
|
||||
try:
|
||||
import subprocess
|
||||
except ImportError:
|
||||
pass
|
||||
else:
|
||||
try:
|
||||
utime = subprocess.check_output(["date", "--date", string, "+%s"])
|
||||
except subprocess.CalledProcessError:
|
||||
pass
|
||||
else:
|
||||
return datetime.utcfromtimestamp(int(utime))
|
||||
raise argparse.ArgumentTypeError("cannot parse date: %s"%string)
|
||||
|
||||
def get_standard_papersize(string):
|
||||
papersizes = {
|
||||
"11x17" : "792x792^", # "792x1224",
|
||||
"ledger" : "792x792^", # "1224x792",
|
||||
"legal" : "612x612^", # "612x1008",
|
||||
"letter" : "612x612^", # "612x792",
|
||||
"arche" : "2592x2592^", # "2592x3456",
|
||||
"archd" : "1728x1728^", # "1728x2592",
|
||||
"archc" : "1296x1296^", # "1296x1728",
|
||||
"archb" : "864x864^", # "864x1296",
|
||||
"archa" : "648x648^", # "648x864",
|
||||
"a0" : "2380x2380^", # "2380x3368",
|
||||
"a1" : "1684x1684^", # "1684x2380",
|
||||
"a2" : "1190x1190^", # "1190x1684",
|
||||
"a3" : "842x842^", # "842x1190",
|
||||
"a4" : "595x595^", # "595x842",
|
||||
"a5" : "421x421^", # "421x595",
|
||||
"a6" : "297x297^", # "297x421",
|
||||
"a7" : "210x210^", # "210x297",
|
||||
"a8" : "148x148^", # "148x210",
|
||||
"a9" : "105x105^", # "105x148",
|
||||
"a10" : "74x74^", # "74x105",
|
||||
"b0" : "2836x2836^", # "2836x4008",
|
||||
"b1" : "2004x2004^", # "2004x2836",
|
||||
"b2" : "1418x1418^", # "1418x2004",
|
||||
"b3" : "1002x1002^", # "1002x1418",
|
||||
"b4" : "709x709^", # "709x1002",
|
||||
"b5" : "501x501^", # "501x709",
|
||||
"c0" : "2600x2600^", # "2600x3677",
|
||||
"c1" : "1837x1837^", # "1837x2600",
|
||||
"c2" : "1298x1298^", # "1298x1837",
|
||||
"c3" : "918x918^", # "918x1298",
|
||||
"c4" : "649x649^", # "649x918",
|
||||
"c5" : "459x459^", # "459x649",
|
||||
"c6" : "323x323^", # "323x459",
|
||||
"flsa" : "612x612^", # "612x936",
|
||||
"flse" : "612x612^", # "612x936",
|
||||
"halfletter" : "396x396^", # "396x612",
|
||||
"tabloid" : "792x792^", # "792x1224",
|
||||
"statement" : "396x396^", # "396x612",
|
||||
"executive" : "540x540^", # "540x720",
|
||||
"folio" : "612x612^", # "612x936",
|
||||
"quarto" : "610x610^", # "610x780"
|
||||
}
|
||||
|
||||
string = string.lower()
|
||||
return papersizes.get(string, string)
|
||||
|
||||
def valid_size(string):
|
||||
# conversion factors from units to points
|
||||
units = {
|
||||
'in' : 72.0,
|
||||
'cm' : 72.0/2.54,
|
||||
'mm' : 72.0/25.4,
|
||||
'pt' : 1.0
|
||||
}
|
||||
|
||||
pagesize_options = {
|
||||
'exact' : ['\!', False],
|
||||
'shrink' : ['\>', False],
|
||||
'enlarge' : ['\<', False],
|
||||
'fill' : ['\^', False],
|
||||
'percent' : ['\%', False],
|
||||
'count' : ['\@', False],
|
||||
}
|
||||
|
||||
string = get_standard_papersize(string)
|
||||
|
||||
pattern = re.compile(r"""
|
||||
([0-9]*\.?[0-9]*) # tokens.group(1) == width; may be empty
|
||||
([a-z]*) # tokens.group(2) == units; may be empty
|
||||
x
|
||||
([0-9]*\.?[0-9]*) # tokens.group(3) == height; may be empty
|
||||
([a-zA-Z]*) # tokens.group(4) == units; may be empty
|
||||
([^0-9a-zA-Z]*) # tokens.group(5) == extra options
|
||||
""", re.VERBOSE)
|
||||
|
||||
tokens = pattern.match(string)
|
||||
|
||||
# tokens.group(0) should match entire input string
|
||||
if tokens.group(0) != string:
|
||||
msg = ('Input size needs to be of the format AuxBv#, '
|
||||
'where A is width, B is height, u and v are units, '
|
||||
'# are options. '
|
||||
'You may omit either width or height, but not both. '
|
||||
'Units may be specified as (in, cm, mm, pt). '
|
||||
'You may omit units, which will default to pt. '
|
||||
'Available options include (! = exact ; ^ = fill ; default = into).')
|
||||
raise argparse.ArgumentTypeError(msg)
|
||||
|
||||
# temporary list to loop through to process width and height
|
||||
pagesize_size = {
|
||||
'x' : [0, tokens.group(1), tokens.group(2)],
|
||||
'y' : [0, tokens.group(3), tokens.group(4)]
|
||||
}
|
||||
|
||||
for key, value in pagesize_size.items():
|
||||
try:
|
||||
value[0] = float(value[1])
|
||||
value[0] *= units[value[2]] # convert to points
|
||||
except ValueError:
|
||||
# assign None if width or height not provided
|
||||
value[0] = None
|
||||
except KeyError:
|
||||
# if units unrecognized, raise error
|
||||
# otherwise default to pt because units not provided
|
||||
if value[2]:
|
||||
msg = "unrecognized unit '%s'." % value[2]
|
||||
raise argparse.ArgumentTypeError(msg)
|
||||
|
||||
x = pagesize_size['x'][0]
|
||||
y = pagesize_size['y'][0]
|
||||
|
||||
# parse options for resize methods
|
||||
if tokens.group(5):
|
||||
for key, value in pagesize_options.items():
|
||||
if re.search(value[0], tokens.group(5)):
|
||||
value[1] = True
|
||||
|
||||
if pagesize_options['fill'][1]:
|
||||
# if either width or height is not given, try to fill in missing value
|
||||
if not x:
|
||||
x = y
|
||||
elif not y:
|
||||
y = x
|
||||
|
||||
if pagesize_options['exact'][1]:
|
||||
if not x or not y:
|
||||
msg = ('exact size requires both width and height.')
|
||||
raise argparse.ArgumentTypeError(msg)
|
||||
|
||||
if not x and not y:
|
||||
msg = ('width and height cannot both be omitted.')
|
||||
raise argparse.ArgumentTypeError(msg)
|
||||
|
||||
return (x, y, pagesize_options)
|
||||
|
||||
# in python3, the received argument will be a unicode str() object which needs
|
||||
# to be encoded into a bytes() object
|
||||
# in python2, the received argument will be a binary str() object which needs
|
||||
# no encoding
|
||||
# we check whether we use python2 or python3 by checking whether the argument
|
||||
# is both, type str and type bytes (only the case in python2)
|
||||
def pdf_embedded_string(string):
|
||||
if type(string) is str and type(string) is not bytes:
|
||||
# py3
|
||||
pass
|
||||
else:
|
||||
# py2
|
||||
string = string.decode("utf8")
|
||||
string = b"\xfe\xff"+string.encode("utf-16-be")
|
||||
string = string.replace(b'\\', b'\\\\')
|
||||
string = string.replace(b'(', b'\\(')
|
||||
string = string.replace(b')', b'\\)')
|
||||
return string
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Lossless conversion/embedding of images (in)to pdf')
|
||||
parser.add_argument(
|
||||
'images', metavar='infile', type=argparse.FileType('rb'),
|
||||
'images', metavar='infile', type=str,
|
||||
nargs='+', help='input file(s)')
|
||||
parser.add_argument(
|
||||
'-o', '--output', metavar='out', type=argparse.FileType('wb'),
|
||||
default=sys.stdout, help='output file (default: stdout)')
|
||||
parser.add_argument(
|
||||
default=getattr(sys.stdout, "buffer", sys.stdout),
|
||||
help='output file (default: stdout)')
|
||||
|
||||
sizeopts = parser.add_mutually_exclusive_group()
|
||||
sizeopts.add_argument(
|
||||
'-d', '--dpi', metavar='dpi', type=positive_float,
|
||||
help='dpi for pdf output (default: 96.0)')
|
||||
help=('dpi for pdf output. '
|
||||
'If input image does not specify dpi the default is %.2f. '
|
||||
'Must not be used with -s/--pagesize.') % default_dpi
|
||||
)
|
||||
|
||||
sizeopts.add_argument(
|
||||
'-s', '--pagesize', metavar='size', type=valid_size,
|
||||
default=(None, None, None),
|
||||
help=('size of the pdf pages in format AuxBv#, '
|
||||
'where A is width, B is height, u and v are units, # are options. '
|
||||
'You may omit either width or height, but not both. '
|
||||
'Some common page sizes, such as letter and a4, are also recognized. '
|
||||
'Units may be specified as (in, cm, mm, pt). '
|
||||
'Units default to pt when absent. '
|
||||
'Available options include (! = exact ; ^ = fill ; default = into). '
|
||||
'Must not be used with -d/--dpi.')
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'-t', '--title', metavar='title', type=str,
|
||||
'-t', '--title', metavar='title', type=pdf_embedded_string,
|
||||
help='title for metadata')
|
||||
parser.add_argument(
|
||||
'-a', '--author', metavar='author', type=str,
|
||||
'-a', '--author', metavar='author', type=pdf_embedded_string,
|
||||
help='author for metadata')
|
||||
parser.add_argument(
|
||||
'-c', '--creator', metavar='creator', type=str,
|
||||
'-c', '--creator', metavar='creator', type=pdf_embedded_string,
|
||||
help='creator for metadata')
|
||||
parser.add_argument(
|
||||
'-p', '--producer', metavar='producer', type=str,
|
||||
'-p', '--producer', metavar='producer', type=pdf_embedded_string,
|
||||
help='producer for metadata')
|
||||
parser.add_argument(
|
||||
'-r', '--creationdate', metavar='creationdate', type=valid_date,
|
||||
help='creation date for metadata in YYYY-MM-DDTHH:MM:SS format')
|
||||
help='UTC creation date for metadata in YYYY-MM-DD or YYYY-MM-DDTHH:MM or YYYY-MM-DDTHH:MM:SS format or any format understood by python dateutil module or any format understood by `date --date`')
|
||||
parser.add_argument(
|
||||
'-m', '--moddate', metavar='moddate', type=valid_date,
|
||||
help='modification date for metadata in YYYY-MM-DDTHH:MM:SS format')
|
||||
help='UTC modification date for metadata in YYYY-MM-DD or YYYY-MM-DDTHH:MM or YYYY-MM-DDTHH:MM:SS format or any format understood by python dateutil module or any format understood by `date --date`')
|
||||
parser.add_argument(
|
||||
'-s', '--subject', metavar='subject', type=str,
|
||||
'-S', '--subject', metavar='subject', type=pdf_embedded_string,
|
||||
help='subject for metadata')
|
||||
parser.add_argument(
|
||||
'-k', '--keywords', metavar='kw', type=str, nargs='+',
|
||||
'-k', '--keywords', metavar='kw', type=pdf_embedded_string, nargs='+',
|
||||
help='keywords for metadata')
|
||||
parser.add_argument(
|
||||
'-C', '--colorspace', metavar='colorspace', type=str,
|
||||
help='force PIL colorspace (one of: RGB, L, 1)')
|
||||
'-C', '--colorspace', metavar='colorspace', type=pdf_embedded_string,
|
||||
help='force PIL colorspace (one of: RGB, L, 1, CMYK, CMYK;I)')
|
||||
parser.add_argument(
|
||||
'-D', '--nodate', help='do not add timestamps', action="store_true")
|
||||
parser.add_argument(
|
||||
'-v', '--verbose', help='verbose mode', action="store_true")
|
||||
parser.add_argument(
|
||||
'-V', '--version', action='version', version='%(prog)s '+__version__,
|
||||
help="Print version information and exit")
|
||||
|
||||
def main(args=None):
|
||||
if args is None:
|
||||
args = sys.argv[1:]
|
||||
args = parser.parse_args(args)
|
||||
|
||||
args.output.write(
|
||||
convert(
|
||||
args.images, args.dpi, args.title, args.author,
|
||||
args.images, args.dpi, args.pagesize, args.title, args.author,
|
||||
args.creator, args.producer, args.creationdate, args.moddate,
|
||||
args.subject, args.keywords, args.colorspace, args.verbose))
|
||||
args.subject, args.keywords, args.colorspace, args.nodate,
|
||||
args.verbose))
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
|
|
|
@ -85,6 +85,6 @@ def parsejp2(data):
|
|||
if __name__ == "__main__":
|
||||
import sys
|
||||
width, height, colorspace = parsejp2(open(sys.argv[1]).read())
|
||||
print "width = %d"%width
|
||||
print "height = %d"%height
|
||||
print "colorspace = %s"%colorspace
|
||||
sys.stdout.write("width = %d"%width)
|
||||
sys.stdout.write("height = %d"%height)
|
||||
sys.stdout.write("colorspace = %s"%colorspace)
|
||||
|
|
|
@ -1,7 +1,109 @@
|
|||
import unittest
|
||||
import test_img2pdf
|
||||
|
||||
import os
|
||||
import img2pdf
|
||||
import zlib
|
||||
from PIL import Image
|
||||
|
||||
HERE = os.path.dirname(__file__)
|
||||
|
||||
#convert +set date:create +set date:modify -define png:exclude-chunk=time
|
||||
|
||||
def test_suite():
|
||||
class TestImg2Pdf(unittest.TestCase):
|
||||
pass
|
||||
|
||||
for test_name in os.listdir(os.path.join(HERE, "input")):
|
||||
inputf = os.path.join(HERE, "input", test_name)
|
||||
if not os.path.isfile(inputf):
|
||||
continue
|
||||
outputf = os.path.join(HERE, "output", test_name+".pdf")
|
||||
assert os.path.isfile(outputf)
|
||||
def handle(self, f=inputf, out=outputf):
|
||||
with open(f, "rb") as inf:
|
||||
orig_imgdata = inf.read()
|
||||
pdf = img2pdf.convert([f], nodate=True)
|
||||
imgdata = b""
|
||||
instream = False
|
||||
imgobj = False
|
||||
colorspace = None
|
||||
imgfilter = None
|
||||
width = None
|
||||
height = None
|
||||
length = None
|
||||
# ugly workaround to parse the created pdf
|
||||
for line in pdf.split(b'\n'):
|
||||
if instream:
|
||||
if line == b"endstream":
|
||||
break
|
||||
else:
|
||||
imgdata += line + b'\n'
|
||||
else:
|
||||
if imgobj and line == b"stream":
|
||||
instream = True
|
||||
elif b"/Subtype /Image" in line:
|
||||
imgobj = True
|
||||
elif b"/Width" in line:
|
||||
width = int(line.split()[-1])
|
||||
elif b"/Height" in line:
|
||||
height = int(line.split()[-1])
|
||||
elif b"/Length" in line:
|
||||
length = int(line.split()[-1])
|
||||
elif b"/Filter" in line:
|
||||
imgfilter = line.split()[-2]
|
||||
elif b"/ColorSpace" in line:
|
||||
colorspace = line.split()[-1]
|
||||
# remove trailing \n
|
||||
imgdata = imgdata[:-1]
|
||||
# test if the length field is correct
|
||||
self.assertEqual(len(imgdata), length)
|
||||
# test if the filter is valid:
|
||||
self.assertIn(imgfilter, [b"/DCTDecode", b"/JPXDecode", b"/FlateDecode"])
|
||||
# test if the colorspace is valid
|
||||
self.assertIn(colorspace, [b"/DeviceGray", b"/DeviceRGB", b"/DeviceCMYK"])
|
||||
# test if the image has correct size
|
||||
orig_img = Image.open(f)
|
||||
self.assertEqual(width, orig_img.size[0])
|
||||
self.assertEqual(height, orig_img.size[1])
|
||||
# if the input file is a jpeg then it should've been copied
|
||||
# verbatim into the PDF
|
||||
if imgfilter in [b"/DCTDecode", b"/JPXDecode"]:
|
||||
self.assertEqual(imgdata, orig_imgdata)
|
||||
elif imgfilter == b"/FlateDecode":
|
||||
# otherwise, the data is flate encoded and has to be equal to
|
||||
# the pixel data of the input image
|
||||
imgdata = zlib.decompress(imgdata)
|
||||
if colorspace == b"/DeviceGray":
|
||||
colorspace = 'L'
|
||||
elif colorspace == b"/DeviceRGB":
|
||||
colorspace = 'RGB'
|
||||
elif colorspace == b"/DeviceCMYK":
|
||||
colorspace = 'CMYK'
|
||||
else:
|
||||
raise Exception("invalid colorspace")
|
||||
im = Image.frombytes(colorspace, (width, height), imgdata)
|
||||
if orig_img.mode == '1':
|
||||
orig_img = orig_img.convert("L")
|
||||
elif orig_img.mode not in ("RGB", "L", "CMYK", "CMYK;I"):
|
||||
orig_img = orig_img.convert("RGB")
|
||||
self.assertEqual(im.tobytes(), orig_img.tobytes())
|
||||
# the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the close() method
|
||||
try:
|
||||
im.close()
|
||||
except AttributeError:
|
||||
pass
|
||||
# lastly, make sure that the generated pdf matches bit by bit the
|
||||
# expected pdf
|
||||
with open(out, "rb") as outf:
|
||||
out = outf.read()
|
||||
self.assertEqual(pdf, out)
|
||||
# the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the close() method
|
||||
try:
|
||||
orig_img.close()
|
||||
except AttributeError:
|
||||
pass
|
||||
setattr(TestImg2Pdf, "test_%s"%test_name, handle)
|
||||
|
||||
return unittest.TestSuite((
|
||||
unittest.makeSuite(test_img2pdf.TestImg2Pdf),
|
||||
unittest.makeSuite(TestImg2Pdf),
|
||||
))
|
||||
|
|
BIN
src/tests/input/CMYK.jpg
Normal file
BIN
src/tests/input/CMYK.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 4.7 KiB |
BIN
src/tests/input/CMYK.tif
Normal file
BIN
src/tests/input/CMYK.tif
Normal file
Binary file not shown.
Before Width: | Height: | Size: 2.3 KiB After Width: | Height: | Size: 2.3 KiB |
Before Width: | Height: | Size: 1.1 KiB After Width: | Height: | Size: 1.1 KiB |
BIN
src/tests/output/CMYK.jpg.pdf
Normal file
BIN
src/tests/output/CMYK.jpg.pdf
Normal file
Binary file not shown.
BIN
src/tests/output/CMYK.tif.pdf
Normal file
BIN
src/tests/output/CMYK.tif.pdf
Normal file
Binary file not shown.
Binary file not shown.
BIN
src/tests/output/normal.png.pdf
Normal file
BIN
src/tests/output/normal.png.pdf
Normal file
Binary file not shown.
|
@ -1,20 +0,0 @@
|
|||
import datetime
|
||||
import os
|
||||
import unittest
|
||||
import img2pdf
|
||||
|
||||
HERE = os.path.dirname(__file__)
|
||||
moddate = datetime.datetime(2014, 1, 1)
|
||||
|
||||
class TestImg2Pdf(unittest.TestCase):
|
||||
def test_jpg2pdf(self):
|
||||
with open(os.path.join(HERE, 'test.jpg'), 'r') as img_fp:
|
||||
with open(os.path.join(HERE, 'test.pdf'), 'r') as pdf_fp:
|
||||
self.assertEqual(
|
||||
img2pdf.convert([img_fp], 150,
|
||||
creationdate=moddate, moddate=moddate),
|
||||
pdf_fp.read())
|
||||
|
||||
def test_png2pdf(self):
|
||||
with open(os.path.join(HERE, 'test.png'), 'r') as img_fp:
|
||||
self.assertRaises(SystemExit, img2pdf.convert, [img_fp], 150)
|
Loading…
Reference in a new issue