forked from josch/img2pdf
Merge branch 'master' of http://gitlab.mister-muffin.de/josch/img2pdf
# Conflicts: # src/img2pdf.py
This commit is contained in:
commit
1261741136
16 changed files with 711 additions and 217 deletions
38
CHANGES.rst
38
CHANGES.rst
|
@ -2,7 +2,43 @@
|
||||||
CHANGES
|
CHANGES
|
||||||
=======
|
=======
|
||||||
|
|
||||||
1.0.0 (unreleased)
|
0.1.6
|
||||||
|
-----
|
||||||
|
|
||||||
|
- replace -x and -y option by combined option -s (or --pagesize) and use -S
|
||||||
|
for --subject
|
||||||
|
- correctly encode and escape non-ascii metadata
|
||||||
|
- explicitly store date in UTC and allow parsing all date formats understood
|
||||||
|
by dateutil and `date --date`
|
||||||
|
|
||||||
|
0.1.5
|
||||||
|
-----
|
||||||
|
|
||||||
|
- Enable support for CMYK images
|
||||||
|
- Rework test suite
|
||||||
|
- support file objects as input
|
||||||
|
|
||||||
|
0.1.4
|
||||||
|
-----
|
||||||
|
|
||||||
|
- add Python 3 support
|
||||||
|
- make output reproducible by sorting and --nodate option
|
||||||
|
|
||||||
|
0.1.3
|
||||||
|
-----
|
||||||
|
|
||||||
|
- Avoid leaking file descriptors
|
||||||
|
- Convert unrecognized colorspaces to RGB
|
||||||
|
|
||||||
|
0.1.1
|
||||||
|
-----
|
||||||
|
|
||||||
|
- allow running src/img2pdf.py standalone
|
||||||
|
- license change from GPL to LGPL
|
||||||
|
- Add pillow 2.4.0 support
|
||||||
|
- add options to specify pdf dimensions in points
|
||||||
|
|
||||||
|
0.1.0 (unreleased)
|
||||||
------------------
|
------------------
|
||||||
|
|
||||||
- Initial PyPI release.
|
- Initial PyPI release.
|
||||||
|
|
202
README.md
202
README.md
|
@ -1,15 +1,16 @@
|
||||||
img2pdf
|
img2pdf
|
||||||
=======
|
=======
|
||||||
|
|
||||||
Lossless conversion of images to PDF without unnecessarily re-encoding JPEG and
|
Losslessly convert images to PDF without unnecessarily re-encoding JPEG and
|
||||||
JPEG2000 files. Thus, no loss of quality and no unnecessary large output file.
|
JPEG2000 files. Image quality is retained without unnecessarily increasing
|
||||||
|
file size.
|
||||||
|
|
||||||
Background
|
Background
|
||||||
----------
|
----------
|
||||||
|
|
||||||
PDF is able to embed JPEG and JPEG2000 images as they are without re-encoding
|
Quality loss can be avoided when converting JPEG and JPEG2000 images to
|
||||||
them (and hence loosing quality) but I was missing a tool to do this
|
PDF by embedding them without re-encoding. I wrote this piece of python code.
|
||||||
automatically, thus I wrote this piece of python code.
|
because I was missing a tool to do this automatically.
|
||||||
|
|
||||||
If you know how to embed JPEG and JPEG2000 images into a PDF container without
|
If you know how to embed JPEG and JPEG2000 images into a PDF container without
|
||||||
recompression, using existing tools, please contact me so that I can put this
|
recompression, using existing tools, please contact me so that I can put this
|
||||||
|
@ -18,110 +19,173 @@ code into the garbage bin :D
|
||||||
Functionality
|
Functionality
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
The program will take image filenames from commandline arguments and output a
|
This program will take a list of images and produce a PDF file with the images
|
||||||
PDF file with them embedded into it. If the input image is a JPEG or JPEG2000
|
embedded in it. JPEG and JPEG2000 images will be included without
|
||||||
file, it will be included as-is without any processing. If it is in any other
|
recompression. Images in other formats will be included with zip/flate
|
||||||
format, the image will be included as zip-encoded RGB. As a result, this tool
|
encoding which usually leads to an increase in the resulting size because
|
||||||
will be able to lossless wrap any image into a PDF container while performing
|
formats like png compress better than PDF which just zip/flate compresses the
|
||||||
better (in terms of quality/filesize ratio) than existing tools in case the
|
RGB data. As a result, this tool is able to losslessly wrap images into a PDF
|
||||||
input image is a JPEG or JPEG2000 file.
|
container with a quality-filesize ratio that is typically better (in case of
|
||||||
|
JPEG and JPEG2000 images) or equal (in case of other formats) than that of
|
||||||
|
existing tools.
|
||||||
|
|
||||||
For example, imagemagick will re-encode the input JPEG image and thus change
|
For example, imagemagick will re-encode the input JPEG image (thus changing
|
||||||
its content:
|
its content):
|
||||||
|
|
||||||
$ convert img.jpg img.pdf
|
$ convert img.jpg img.pdf
|
||||||
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
|
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
|
||||||
$ compare -metric AE img.jpg img.extr-000.ppm null:
|
$ compare -metric AE img.jpg img.extr-000.ppm null:
|
||||||
1.6301e+06
|
1.6301e+06
|
||||||
|
|
||||||
If one wants to do a lossless conversion from any format to PDF with
|
If one wants to losslessly convert from any format to PDF with
|
||||||
imagemagick then one has to use zip-encoding:
|
imagemagick, one has to use zip compression:
|
||||||
|
|
||||||
$ convert input.jpg -compress Zip output.pdf
|
$ convert input.jpg -compress Zip output.pdf
|
||||||
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
|
$ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
|
||||||
$ compare -metric AE img.jpg img.extr-000.ppm null:
|
$ compare -metric AE img.jpg img.extr-000.ppm null:
|
||||||
0
|
0
|
||||||
|
|
||||||
The downside is, that using imagemagick like this will make the resulting PDF
|
However, this approach will result in PDF files that are a few times larger
|
||||||
files a few times bigger than the input JPEG or JPEG2000 file and can also not
|
than the input JPEG or JPEG2000 file.
|
||||||
output a multipage PDF.
|
|
||||||
|
|
||||||
img2pdf is able to output a PDF with multiple pages if more than one input
|
img2pdf is able to losslessly embed JPEG and JPEG2000 files into a PDF
|
||||||
image is given, losslessly embed JPEG and JPEG2000 files into a PDF container
|
container without additional overhead (aside from the PDF structure itself),
|
||||||
without adding more overhead than the PDF structure itself and will save all
|
save other graphics formats using lossless zip compression,
|
||||||
other graphics formats using lossless zip-compression.
|
and produce multi-page PDF files when more than one input image is given.
|
||||||
|
|
||||||
Another nifty advantage: Since no re-encoding is done in case of JPEG images,
|
Also, since JPEG and JPEG2000 images are not reencoded, conversion with
|
||||||
the conversion is many (ten to hundred) times faster with img2pdf compared to
|
img2pdf is several times faster than with other tools.
|
||||||
imagemagick. While a run of above convert command with a 2.8MB JPEG takes 27
|
|
||||||
seconds (on average) on my machine, conversion using img2pdf takes just a
|
|
||||||
fraction of a second.
|
|
||||||
|
|
||||||
Commandline Arguments
|
|
||||||
---------------------
|
|
||||||
|
|
||||||
At least one input file argument must be given as img2pdf needs to seek in the
|
Usage
|
||||||
file descriptor which would not be possible for stdin.
|
-----
|
||||||
|
|
||||||
Specify the dpi with the -d or --dpi options instead of reading it from the
|
#### General Notes
|
||||||
image or falling back to 96.0.
|
|
||||||
|
|
||||||
Specify the output file with -o or --output. By default output will be done to
|
The images must be provided as files because img2pdf needs to seek
|
||||||
stdout.
|
in the file descriptor. Input cannot be piped through stdin.
|
||||||
|
|
||||||
Specify metadata using the --title, --author, --creator, --producer,
|
If no output file is specified with the `-o`/`--output` option,
|
||||||
--creationdate, --moddate, --subject and --keywords options (or their short
|
output will be to stdout.
|
||||||
forms).
|
|
||||||
|
|
||||||
Specify -C or --colorspace to force a colorspace using PIL short handles like
|
Descriptions of the options should be self explanatory.
|
||||||
'RGB', 'L' or '1'.
|
They are available by running:
|
||||||
|
|
||||||
More help is available with the -h or --help option.
|
img2pdf --help
|
||||||
|
|
||||||
|
|
||||||
|
#### Controlling Page Size
|
||||||
|
|
||||||
|
The PDF page size can be manipulated. By default, the image will be sized "into" the given dimensions with the aspect ratio retained. For instance, to size an image into a page that is at most 500pt x 500pt, use:
|
||||||
|
|
||||||
|
img2pdf -s 500x500 -o output.pdf input.jpg
|
||||||
|
|
||||||
|
To "fill" out a page that is at least 500pt x 500pt, follow the dimensions with a `^`:
|
||||||
|
|
||||||
|
img2pdf -s 500x500^ -o output.pdf input.jpg
|
||||||
|
|
||||||
|
To output pages that are exactly 500pt x 500pt, follow the dimensions with an `!`:
|
||||||
|
|
||||||
|
img2pdf -s 500x500\! -o output.pdf input.jpg
|
||||||
|
|
||||||
|
Notice that the default unit is points. Units may be also be specified and mixed:
|
||||||
|
|
||||||
|
img2pdf -s 8.5inx27.94cm -o output.pdf input.jpg
|
||||||
|
|
||||||
|
If either width or height is omitted, the other will be calculated
|
||||||
|
to preserve aspect ratio.
|
||||||
|
|
||||||
|
img2pdf -s x280mm -o output1.pdf input.jpg
|
||||||
|
img2pdf -s 280mmx -o output2.pdf input.jpg
|
||||||
|
|
||||||
|
Some standard page sizes are recognized:
|
||||||
|
|
||||||
|
img2pdf -s letter -o output1.pdf input.jpg
|
||||||
|
img2pdf -s a4 -o output2.pdf input.jpg
|
||||||
|
|
||||||
|
#### Colorspace
|
||||||
|
|
||||||
|
Currently, the colorspace must be forced for JPEG 2000 images that are
|
||||||
|
not in the RGB colorspace. Available colorspace options are based on
|
||||||
|
Python Imaging Library (PIL) short handles.
|
||||||
|
|
||||||
|
* `RGB` = RGB color
|
||||||
|
* `L` = Grayscale
|
||||||
|
* `1` = Black and white (internally converted to grayscale)
|
||||||
|
* `CMYK` = CMYK color
|
||||||
|
* `CMYK;I` = CMYK color with inversion
|
||||||
|
|
||||||
|
For example, to encode a grayscale JPEG2000 image, use:
|
||||||
|
|
||||||
|
img2pdf -C L -o output.pdf input.jp2
|
||||||
|
|
||||||
Bugs
|
Bugs
|
||||||
----
|
----
|
||||||
|
|
||||||
If you find a JPEG or JPEG2000 file that, when embedded can not be read by the
|
If you find a JPEG or JPEG2000 file that, when embedded cannot be read
|
||||||
Adobe Acrobat Reader, please contact me.
|
by the Adobe Acrobat Reader, please contact me.
|
||||||
|
|
||||||
For lossless conversion of other formats than JPEG or JPEG2000 files, zip/flate
|
For lossless conversion of formats other than JPEG or JPEG2000, zip/flate
|
||||||
encoding is used. This choice is based on a number of tests I did on images.
|
encoding is used. This choice is based on tests I did with a number of images.
|
||||||
I converted them into PDF using imagemagick and all compressions it has to
|
I converted them into PDF using the lossless variants of the compression
|
||||||
offer and then compared the output size of the lossless variants. In all my
|
formats offered by imagemagick. In all my tests, zip/flate encoding performed
|
||||||
tests, zip/flate encoding performed best. You can verify my findings using the
|
best. You can verify my findings using the test_comp.sh script with any input
|
||||||
test_comp.sh script with any input image given as a commandline argument. If
|
image given as a commandline argument. If you find an input file that is
|
||||||
you find an input file that is outperformed by another lossless compression,
|
outperformed by another lossless compression method, contact me.
|
||||||
contact me.
|
|
||||||
|
|
||||||
I have not yet figured out how to read the colorspace from jpeg2000 files.
|
I have not yet figured out how to determine the colorspace of JPEG2000 files.
|
||||||
Therefor jpeg2000 files use DeviceRGB per default. If your jpeg2000 files are
|
Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 files with
|
||||||
of any other colorspace you must force it using the --colorspace option.
|
other colorspaces, you must force it using the `--colorspace` option.
|
||||||
Like -C L for DeviceGray.
|
|
||||||
|
It might be possible to store transparency using masks but it is not clear
|
||||||
|
what the utility of such a functionality would be.
|
||||||
|
|
||||||
|
Most vector graphic formats can be losslessly turned into PDF (minus some of
|
||||||
|
the features unsupported by PDF) but img2pdf will currently turn vector
|
||||||
|
graphics into their lossy raster representations.
|
||||||
|
|
||||||
|
Acrobat is able to store a hint for the PDF reader of how to present the PDF
|
||||||
|
when opening it. Things like automatic fullscreen or the zoom level can be
|
||||||
|
configured.
|
||||||
|
|
||||||
|
It would be nice if a single input image could be read from standard input.
|
||||||
|
|
||||||
Installation
|
Installation
|
||||||
------------
|
------------
|
||||||
|
|
||||||
You can install the package using:
|
On a Debian- and Ubuntu-based systems, dependencies may be installed
|
||||||
|
with the following command:
|
||||||
|
|
||||||
$ pip install img2pdf
|
apt-get install python python-pil python-setuptools
|
||||||
|
|
||||||
If you want to install from source code simply use:
|
Or for Python 3:
|
||||||
|
|
||||||
$ cd img2pdf/
|
apt-get install python3 python3-pil python3-setuptools
|
||||||
$ pip install .
|
|
||||||
|
You can then install the package using:
|
||||||
|
|
||||||
|
$ pip install img2pdf
|
||||||
|
|
||||||
|
If you prefer to install from source code use:
|
||||||
|
|
||||||
|
$ cd img2pdf/
|
||||||
|
$ pip install .
|
||||||
|
|
||||||
To test the console script without installing the package on your system,
|
To test the console script without installing the package on your system,
|
||||||
simply use virtualenv:
|
use virtualenv:
|
||||||
|
|
||||||
$ cd img2pdf/
|
$ cd img2pdf/
|
||||||
$ virtualenv ve
|
$ virtualenv ve
|
||||||
$ ve/bin/pip install .
|
$ ve/bin/pip install .
|
||||||
|
|
||||||
You can then test the converter using:
|
You can then test the converter using:
|
||||||
|
|
||||||
$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
|
$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
|
||||||
|
|
||||||
Note that the package can also be used as a library as follows:
|
The package can also be used as a library:
|
||||||
|
|
||||||
import img2pdf
|
import img2pdf
|
||||||
pdf_bytes = img2pdf('test.jpg', dpi=150)
|
pdf_bytes = img2pdf.convert(['test.jpg'])
|
||||||
|
|
||||||
|
file = open("name.pdf","wb")
|
||||||
|
file.write(pdf_bytes)
|
||||||
|
|
2
setup.cfg
Normal file
2
setup.cfg
Normal file
|
@ -0,0 +1,2 @@
|
||||||
|
[metadata]
|
||||||
|
description-file = README.md
|
10
setup.py
10
setup.py
|
@ -1,9 +1,12 @@
|
||||||
from setuptools import setup
|
from setuptools import setup
|
||||||
|
|
||||||
|
VERSION="0.1.6~git"
|
||||||
|
|
||||||
setup (
|
setup (
|
||||||
name='img2pdf',
|
name='img2pdf',
|
||||||
version='0.1.0',
|
version=VERSION,
|
||||||
author = "Johannes 'josch' Schauer",
|
author = "Johannes 'josch' Schauer",
|
||||||
|
author_email = 'j.schauer@email.de',
|
||||||
description = "Convert images to PDF via direct JPEG inclusion.",
|
description = "Convert images to PDF via direct JPEG inclusion.",
|
||||||
long_description = open('README.md').read(),
|
long_description = open('README.md').read(),
|
||||||
license = "LGPL",
|
license = "LGPL",
|
||||||
|
@ -15,12 +18,15 @@ setup (
|
||||||
'Programming Language :: Python :: 2',
|
'Programming Language :: Python :: 2',
|
||||||
'Programming Language :: Python :: 2.6',
|
'Programming Language :: Python :: 2.6',
|
||||||
'Programming Language :: Python :: 2.7',
|
'Programming Language :: Python :: 2.7',
|
||||||
|
'Programming Language :: Python :: 3',
|
||||||
|
'Programming Language :: Python :: 3.4',
|
||||||
'Programming Language :: Python :: Implementation :: CPython',
|
'Programming Language :: Python :: Implementation :: CPython',
|
||||||
'License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)',
|
'License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)',
|
||||||
'Programming Language :: Python',
|
'Programming Language :: Python',
|
||||||
'Natural Language :: English',
|
'Natural Language :: English',
|
||||||
'Operating System :: OS Independent'],
|
'Operating System :: OS Independent'],
|
||||||
url = 'http://pypi.python.org/pypi/img2pdf',
|
url = 'https://github.com/josch/img2pdf',
|
||||||
|
download_url = 'https://github.com/josch/img2pdf/archive/'+VERSION+'.tar.gz',
|
||||||
package_dir={"": "src"},
|
package_dir={"": "src"},
|
||||||
py_modules=['img2pdf', 'jp2'],
|
py_modules=['img2pdf', 'jp2'],
|
||||||
include_package_data = True,
|
include_package_data = True,
|
||||||
|
|
544
src/img2pdf.py
544
src/img2pdf.py
|
@ -1,3 +1,5 @@
|
||||||
|
#!/usr/bin/env python2
|
||||||
|
|
||||||
# Copyright (C) 2012-2014 Johannes 'josch' Schauer <j.schauer at email.de>
|
# Copyright (C) 2012-2014 Johannes 'josch' Schauer <j.schauer at email.de>
|
||||||
#
|
#
|
||||||
# This program is free software: you can redistribute it and/or
|
# This program is free software: you can redistribute it and/or
|
||||||
|
@ -15,13 +17,20 @@
|
||||||
# License along with this program. If not, see
|
# License along with this program. If not, see
|
||||||
# <http://www.gnu.org/licenses/>.
|
# <http://www.gnu.org/licenses/>.
|
||||||
|
|
||||||
|
__version__ = "0.1.6~git"
|
||||||
|
default_dpi = 96.0
|
||||||
|
|
||||||
|
import re
|
||||||
import sys
|
import sys
|
||||||
import zlib
|
import zlib
|
||||||
import argparse
|
import argparse
|
||||||
import struct
|
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
from jp2 import parsejp2
|
from jp2 import parsejp2
|
||||||
|
try:
|
||||||
|
from cStringIO import cStringIO
|
||||||
|
except ImportError:
|
||||||
|
from io import BytesIO as cStringIO
|
||||||
|
|
||||||
# XXX: Switch to use logging module.
|
# XXX: Switch to use logging module.
|
||||||
def debug_out(message, verbose=True):
|
def debug_out(message, verbose=True):
|
||||||
|
@ -34,19 +43,28 @@ def error_out(message):
|
||||||
def warning_out(message):
|
def warning_out(message):
|
||||||
sys.stderr.write("W: "+message+"\n")
|
sys.stderr.write("W: "+message+"\n")
|
||||||
|
|
||||||
|
def datetime_to_pdfdate(dt):
|
||||||
|
return dt.strftime("%Y%m%d%H%M%SZ")
|
||||||
|
|
||||||
def parse(cont, indent=1):
|
def parse(cont, indent=1):
|
||||||
if type(cont) is dict:
|
if type(cont) is dict:
|
||||||
return "<<\n"+"\n".join(
|
return b"<<\n"+b"\n".join(
|
||||||
[4 * indent * " " + "%s %s" % (k, parse(v, indent+1))
|
[4 * indent * b" " + k + b" " + parse(v, indent+1)
|
||||||
for k, v in cont.items()])+"\n"+4*(indent-1)*" "+">>"
|
for k, v in sorted(cont.items())])+b"\n"+4*(indent-1)*b" "+b">>"
|
||||||
elif type(cont) is int or type(cont) is float:
|
elif type(cont) is int:
|
||||||
return str(cont)
|
return str(cont).encode()
|
||||||
|
elif type(cont) is float:
|
||||||
|
return ("%0.4f"%cont).encode()
|
||||||
elif isinstance(cont, obj):
|
elif isinstance(cont, obj):
|
||||||
return "%d 0 R"%cont.identifier
|
return ("%d 0 R"%cont.identifier).encode()
|
||||||
elif type(cont) is str:
|
elif type(cont) is str or type(cont) is bytes:
|
||||||
|
if type(cont) is str and type(cont) is not bytes:
|
||||||
|
raise Exception("parse must be passed a bytes object in py3")
|
||||||
return cont
|
return cont
|
||||||
elif type(cont) is list:
|
elif type(cont) is list:
|
||||||
return "[ "+" ".join([parse(c, indent) for c in cont])+" ]"
|
return b"[ "+b" ".join([parse(c, indent) for c in cont])+b" ]"
|
||||||
|
else:
|
||||||
|
raise Exception("cannot handle type %s"%type(cont))
|
||||||
|
|
||||||
class obj(object):
|
class obj(object):
|
||||||
def __init__(self, content, stream=None):
|
def __init__(self, content, stream=None):
|
||||||
|
@ -56,56 +74,56 @@ class obj(object):
|
||||||
def tostring(self):
|
def tostring(self):
|
||||||
if self.stream:
|
if self.stream:
|
||||||
return (
|
return (
|
||||||
"%d 0 obj " % self.identifier +
|
("%d 0 obj " % self.identifier).encode() +
|
||||||
parse(self.content) +
|
parse(self.content) +
|
||||||
"\nstream\n" + self.stream + "\nendstream\nendobj\n")
|
b"\nstream\n" + self.stream + b"\nendstream\nendobj\n")
|
||||||
else:
|
else:
|
||||||
return "%d 0 obj "%self.identifier+parse(self.content)+" endobj\n"
|
return ("%d 0 obj "%self.identifier).encode()+parse(self.content)+b" endobj\n"
|
||||||
|
|
||||||
class pdfdoc(object):
|
class pdfdoc(object):
|
||||||
|
|
||||||
def __init__(self, version=3, title=None, author=None, creator=None,
|
def __init__(self, version=3, title=None, author=None, creator=None,
|
||||||
producer=None, creationdate=None, moddate=None, subject=None,
|
producer=None, creationdate=None, moddate=None, subject=None,
|
||||||
keywords=None):
|
keywords=None, nodate=False):
|
||||||
self.version = version # default pdf version 1.3
|
self.version = version # default pdf version 1.3
|
||||||
now = datetime.now()
|
now = datetime.now()
|
||||||
self.objects = []
|
self.objects = []
|
||||||
|
|
||||||
info = {}
|
info = {}
|
||||||
if title:
|
if title:
|
||||||
info["/Title"] = "("+title+")"
|
info[b"/Title"] = b"("+title+b")"
|
||||||
if author:
|
if author:
|
||||||
info["/Author"] = "("+author+")"
|
info[b"/Author"] = b"("+author+b")"
|
||||||
if creator:
|
if creator:
|
||||||
info["/Creator"] = "("+creator+")"
|
info[b"/Creator"] = b"("+creator+b")"
|
||||||
if producer:
|
if producer:
|
||||||
info["/Producer"] = "("+producer+")"
|
info[b"/Producer"] = b"("+producer+b")"
|
||||||
if creationdate:
|
if creationdate:
|
||||||
info["/CreationDate"] = "(D:"+creationdate.strftime("%Y%m%d%H%M%S")+")"
|
info[b"/CreationDate"] = b"(D:"+datetime_to_pdfdate(creationdate).encode()+b")"
|
||||||
else:
|
elif not nodate:
|
||||||
info["/CreationDate"] = "(D:"+now.strftime("%Y%m%d%H%M%S")+")"
|
info[b"/CreationDate"] = b"(D:"+datetime_to_pdfdate(now).encode()+b")"
|
||||||
if moddate:
|
if moddate:
|
||||||
info["/ModDate"] = "(D:"+moddate.strftime("%Y%m%d%H%M%S")+")"
|
info[b"/ModDate"] = b"(D:"+datetime_to_pdfdate(moddate).encode()+b")"
|
||||||
else:
|
elif not nodate:
|
||||||
info["/ModDate"] = "(D:"+now.strftime("%Y%m%d%H%M%S")+")"
|
info[b"/ModDate"] = b"(D:"+datetime_to_pdfdate(now).encode()+b")"
|
||||||
if subject:
|
if subject:
|
||||||
info["/Subject"] = "("+subject+")"
|
info[b"/Subject"] = b"("+subject+b")"
|
||||||
if keywords:
|
if keywords:
|
||||||
info["/Keywords"] = "("+",".join(keywords)+")"
|
info[b"/Keywords"] = b"("+b",".join(keywords)+b")"
|
||||||
|
|
||||||
self.info = obj(info)
|
self.info = obj(info)
|
||||||
|
|
||||||
# create an incomplete pages object so that a /Parent entry can be
|
# create an incomplete pages object so that a /Parent entry can be
|
||||||
# added to each page
|
# added to each page
|
||||||
self.pages = obj({
|
self.pages = obj({
|
||||||
"/Type": "/Pages",
|
b"/Type": b"/Pages",
|
||||||
"/Kids": [],
|
b"/Kids": [],
|
||||||
"/Count": 0
|
b"/Count": 0
|
||||||
})
|
})
|
||||||
|
|
||||||
self.catalog = obj({
|
self.catalog = obj({
|
||||||
"/Pages": self.pages,
|
b"/Pages": self.pages,
|
||||||
"/Type": "/Catalog"
|
b"/Type": b"/Catalog"
|
||||||
})
|
})
|
||||||
self.addobj(self.catalog)
|
self.addobj(self.catalog)
|
||||||
self.addobj(self.pages)
|
self.addobj(self.pages)
|
||||||
|
@ -115,71 +133,70 @@ class pdfdoc(object):
|
||||||
obj.identifier = newid
|
obj.identifier = newid
|
||||||
self.objects.append(obj)
|
self.objects.append(obj)
|
||||||
|
|
||||||
def addimage(self, color, width, height, dpi, imgformat, imgdata):
|
def addimage(self, color, width, height, imgformat, imgdata, pdf_x, pdf_y):
|
||||||
if color == 'L':
|
if color == 'L':
|
||||||
color = "/DeviceGray"
|
colorspace = b"/DeviceGray"
|
||||||
elif color == 'RGB':
|
elif color == 'RGB':
|
||||||
color = "/DeviceRGB"
|
colorspace = b"/DeviceRGB"
|
||||||
|
elif color == 'CMYK' or color == 'CMYK;I':
|
||||||
|
colorspace = b"/DeviceCMYK"
|
||||||
else:
|
else:
|
||||||
error_out("unsupported color space: %s"%color)
|
error_out("unsupported color space: %s"%color)
|
||||||
exit(1)
|
exit(1)
|
||||||
|
|
||||||
# pdf units = 1/72 inch
|
|
||||||
pdf_x, pdf_y = 72.0*width/dpi[0], 72.0*height/dpi[1]
|
|
||||||
|
|
||||||
print(pdf_x)
|
|
||||||
print(pdf_y)
|
|
||||||
|
|
||||||
if pdf_x < 3.00 or pdf_y < 3.00:
|
if pdf_x < 3.00 or pdf_y < 3.00:
|
||||||
warning_out("pdf width or height is below 3.00 - decrease the dpi")
|
warning_out("pdf width or height is below 3.00\" - decrease the dpi")
|
||||||
elif pdf_x > 14400.0 or pdf_y > 14400.0:
|
elif pdf_x > 200.0 or pdf_y > 200.0:
|
||||||
#error_out(("pdf width or height is above 200.00 - increase the dpi")
|
|
||||||
warning_out("pdf width or height would be above 200\" - squeezed inside")
|
warning_out("pdf width or height would be above 200\" - squeezed inside")
|
||||||
x_scale = 14400.0 / pdf_x
|
x_scale = 200.0 / pdf_x
|
||||||
y_scale = 14400.0 / pdf_y
|
y_scale = 200.0 / pdf_y
|
||||||
scale = min(x_scale, y_scale) * 0.999
|
scale = min(x_scale, y_scale) * 0.999
|
||||||
pdf_x *= scale
|
pdf_x *= scale
|
||||||
pdf_y *= scale
|
pdf_y *= scale
|
||||||
|
|
||||||
# either embed the whole jpeg or deflate the bitmap representation
|
# either embed the whole jpeg or deflate the bitmap representation
|
||||||
if imgformat is "JPEG":
|
if imgformat is "JPEG":
|
||||||
ofilter = [ "/DCTDecode" ]
|
ofilter = [ b"/DCTDecode" ]
|
||||||
elif imgformat is "JP2":
|
elif imgformat is "JPEG2000":
|
||||||
ofilter = [ "/JPXDecode" ]
|
ofilter = [ b"/JPXDecode" ]
|
||||||
self.version = 5 # jpeg2000 needs pdf 1.5
|
self.version = 5 # jpeg2000 needs pdf 1.5
|
||||||
else:
|
else:
|
||||||
ofilter = [ "/FlateDecode" ]
|
ofilter = [ b"/FlateDecode" ]
|
||||||
image = obj({
|
image = obj({
|
||||||
"/Type": "/XObject",
|
b"/Type": b"/XObject",
|
||||||
"/Subtype": "/Image",
|
b"/Subtype": b"/Image",
|
||||||
"/Filter": ofilter,
|
b"/Filter": ofilter,
|
||||||
"/Width": width,
|
b"/Width": width,
|
||||||
"/Height": height,
|
b"/Height": height,
|
||||||
"/ColorSpace": color,
|
b"/ColorSpace": colorspace,
|
||||||
# hardcoded as PIL doesnt provide bits for non-jpeg formats
|
# hardcoded as PIL doesn't provide bits for non-jpeg formats
|
||||||
"/BitsPerComponent": 8,
|
b"/BitsPerComponent": 8,
|
||||||
"/Length": len(imgdata)
|
b"/Length": len(imgdata)
|
||||||
}, imgdata)
|
}, imgdata)
|
||||||
|
|
||||||
text = "q\n%f 0 0 %f 0 0 cm\n/Im0 Do\nQ"%(pdf_x, pdf_y)
|
if color == 'CMYK;I':
|
||||||
|
# Inverts all four channels
|
||||||
|
image.content[b'/Decode'] = [1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0]
|
||||||
|
|
||||||
|
text = ("q\n%0.4f 0 0 %0.4f 0 0 cm\n/Im0 Do\nQ"%(pdf_x, pdf_y)).encode()
|
||||||
|
|
||||||
content = obj({
|
content = obj({
|
||||||
"/Length": len(text)
|
b"/Length": len(text)
|
||||||
}, text)
|
}, text)
|
||||||
|
|
||||||
page = obj({
|
page = obj({
|
||||||
"/Type": "/Page",
|
b"/Type": b"/Page",
|
||||||
"/Parent": self.pages,
|
b"/Parent": self.pages,
|
||||||
"/Resources": {
|
b"/Resources": {
|
||||||
"/XObject": {
|
b"/XObject": {
|
||||||
"/Im0": image
|
b"/Im0": image
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"/MediaBox": [0, 0, pdf_x, pdf_y],
|
b"/MediaBox": [0, 0, pdf_x, pdf_y],
|
||||||
"/Contents": content
|
b"/Contents": content
|
||||||
})
|
})
|
||||||
self.pages.content["/Kids"].append(page)
|
self.pages.content[b"/Kids"].append(page)
|
||||||
self.pages.content["/Count"] += 1
|
self.pages.content[b"/Count"] += 1
|
||||||
self.addobj(page)
|
self.addobj(page)
|
||||||
self.addobj(content)
|
self.addobj(content)
|
||||||
self.addobj(image)
|
self.addobj(image)
|
||||||
|
@ -190,35 +207,43 @@ class pdfdoc(object):
|
||||||
|
|
||||||
xreftable = list()
|
xreftable = list()
|
||||||
|
|
||||||
result = "%%PDF-1.%d\n"%self.version
|
result = ("%%PDF-1.%d\n"%self.version).encode()
|
||||||
|
|
||||||
xreftable.append("0000000000 65535 f \n")
|
xreftable.append(b"0000000000 65535 f \n")
|
||||||
for o in self.objects:
|
for o in self.objects:
|
||||||
xreftable.append("%010d 00000 n \n"%len(result))
|
xreftable.append(("%010d 00000 n \n"%len(result)).encode())
|
||||||
result += o.tostring()
|
result += o.tostring()
|
||||||
|
|
||||||
xrefoffset = len(result)
|
xrefoffset = len(result)
|
||||||
result += "xref\n"
|
result += b"xref\n"
|
||||||
result += "0 %d\n"%len(xreftable)
|
result += ("0 %d\n"%len(xreftable)).encode()
|
||||||
for x in xreftable:
|
for x in xreftable:
|
||||||
result += x
|
result += x
|
||||||
result += "trailer\n"
|
result += b"trailer\n"
|
||||||
result += parse({"/Size": len(xreftable), "/Info": self.info, "/Root": self.catalog})+"\n"
|
result += parse({b"/Size": len(xreftable), b"/Info": self.info, b"/Root": self.catalog})+b"\n"
|
||||||
result += "startxref\n"
|
result += b"startxref\n"
|
||||||
result += "%d\n"%xrefoffset
|
result += ("%d\n"%xrefoffset).encode()
|
||||||
result += "%%EOF\n"
|
result += b"%%EOF\n"
|
||||||
return result
|
return result
|
||||||
|
|
||||||
def convert(images, dpi, title=None, author=None, creator=None, producer=None,
|
def convert(images, dpi=None, pagesize=(None, None, None), title=None,
|
||||||
creationdate=None, moddate=None, subject=None, keywords=None,
|
author=None, creator=None, producer=None, creationdate=None,
|
||||||
colorspace=None, verbose=False):
|
moddate=None, subject=None, keywords=None, colorspace=None,
|
||||||
|
nodate=False, verbose=False):
|
||||||
|
|
||||||
|
pagesize_options = pagesize[2]
|
||||||
|
|
||||||
pdf = pdfdoc(3, title, author, creator, producer, creationdate,
|
pdf = pdfdoc(3, title, author, creator, producer, creationdate,
|
||||||
moddate, subject, keywords)
|
moddate, subject, keywords, nodate)
|
||||||
|
|
||||||
for im in images:
|
for imfilename in images:
|
||||||
rawdata = im.read()
|
debug_out("Reading %s"%imfilename, verbose)
|
||||||
im.seek(0)
|
try:
|
||||||
|
rawdata = imfilename.read()
|
||||||
|
except AttributeError:
|
||||||
|
with open(imfilename, "rb") as im:
|
||||||
|
rawdata = im.read()
|
||||||
|
im = cStringIO(rawdata)
|
||||||
try:
|
try:
|
||||||
imgdata = Image.open(im)
|
imgdata = Image.open(im)
|
||||||
except IOError as e:
|
except IOError as e:
|
||||||
|
@ -229,14 +254,11 @@ def convert(images, dpi, title=None, author=None, creator=None, producer=None,
|
||||||
exit(1)
|
exit(1)
|
||||||
# image is jpeg2000
|
# image is jpeg2000
|
||||||
width, height, ics = parsejp2(rawdata)
|
width, height, ics = parsejp2(rawdata)
|
||||||
imgformat = "JP2"
|
imgformat = "JPEG2000"
|
||||||
|
|
||||||
if dpi:
|
# TODO: read real dpi from input jpeg2000 image
|
||||||
ndpi = dpi, dpi
|
ndpi = (default_dpi, default_dpi)
|
||||||
debug_out("input dpi (forced) = %d x %d"%ndpi, verbose)
|
debug_out("input dpi = %d x %d" % ndpi, verbose)
|
||||||
else:
|
|
||||||
ndpi = (96, 96) # TODO: read real dpi
|
|
||||||
debug_out("input dpi = %d x %d"%ndpi, verbose)
|
|
||||||
|
|
||||||
if colorspace:
|
if colorspace:
|
||||||
color = colorspace
|
color = colorspace
|
||||||
|
@ -248,26 +270,45 @@ def convert(images, dpi, title=None, author=None, creator=None, producer=None,
|
||||||
width, height = imgdata.size
|
width, height = imgdata.size
|
||||||
imgformat = imgdata.format
|
imgformat = imgdata.format
|
||||||
|
|
||||||
if dpi:
|
ndpi = imgdata.info.get("dpi", (default_dpi, default_dpi))
|
||||||
ndpi = dpi, dpi
|
# in python3, the returned dpi value for some tiff images will
|
||||||
debug_out("input dpi (forced) = %d x %d"%ndpi, verbose)
|
# not be an integer but a float. To make the behaviour of
|
||||||
else:
|
# img2pdf the same between python2 and python3, we convert that
|
||||||
ndpi = imgdata.info.get("dpi", (96, 96))
|
# float into an integer by rounding
|
||||||
debug_out("input dpi = %d x %d"%ndpi, verbose)
|
# search online for the 72.009 dpi problem for more info
|
||||||
|
ndpi = (int(round(ndpi[0])),int(round(ndpi[1])))
|
||||||
|
debug_out("input dpi = %d x %d" % ndpi, verbose)
|
||||||
|
|
||||||
if colorspace:
|
if colorspace:
|
||||||
color = colorspace
|
color = colorspace
|
||||||
debug_out("input colorspace (forced) = %s"%(color), verbose)
|
debug_out("input colorspace (forced) = %s"%(color), verbose)
|
||||||
else:
|
else:
|
||||||
color = imgdata.mode
|
color = imgdata.mode
|
||||||
|
if color == "CMYK" and imgformat == "JPEG":
|
||||||
|
# Adobe inverts CMYK JPEGs for some reason, and others
|
||||||
|
# have followed suit as well. Some software assumes the
|
||||||
|
# JPEG is inverted if the Adobe tag (APP14), while other
|
||||||
|
# software assumes all CMYK JPEGs are inverted. I don't
|
||||||
|
# have enough experience with these to know which is
|
||||||
|
# better for images currently in the wild, so I'm going
|
||||||
|
# with the first approach for now.
|
||||||
|
if "adobe" in imgdata.info:
|
||||||
|
color = "CMYK;I"
|
||||||
debug_out("input colorspace = %s"%(color), verbose)
|
debug_out("input colorspace = %s"%(color), verbose)
|
||||||
|
|
||||||
debug_out("width x height = %d x %d"%(width,height), verbose)
|
debug_out("width x height = %d x %d"%(width,height), verbose)
|
||||||
debug_out("imgformat = %s"%imgformat, verbose)
|
debug_out("imgformat = %s"%imgformat, verbose)
|
||||||
|
|
||||||
|
if dpi:
|
||||||
|
ndpi = dpi, dpi
|
||||||
|
debug_out("input dpi (forced) = %d x %d" % ndpi, verbose)
|
||||||
|
elif pagesize_options:
|
||||||
|
ndpi = get_ndpi(width, height, pagesize)
|
||||||
|
debug_out("calculated dpi (based on pagesize) = %d x %d" % ndpi, verbose)
|
||||||
|
|
||||||
# depending on the input format, determine whether to pass the raw
|
# depending on the input format, determine whether to pass the raw
|
||||||
# image or the zlib compressed color information
|
# image or the zlib compressed color information
|
||||||
if imgformat is "JPEG" or imgformat is "JP2":
|
if imgformat is "JPEG" or imgformat is "JPEG2000":
|
||||||
if color == '1':
|
if color == '1':
|
||||||
error_out("jpeg can't be monochrome")
|
error_out("jpeg can't be monochrome")
|
||||||
exit(1)
|
exit(1)
|
||||||
|
@ -275,16 +316,61 @@ def convert(images, dpi, title=None, author=None, creator=None, producer=None,
|
||||||
else:
|
else:
|
||||||
# because we do not support /CCITTFaxDecode
|
# because we do not support /CCITTFaxDecode
|
||||||
if color == '1':
|
if color == '1':
|
||||||
|
debug_out("Converting colorspace 1 to L", verbose)
|
||||||
imgdata = imgdata.convert('L')
|
imgdata = imgdata.convert('L')
|
||||||
color = 'L'
|
color = 'L'
|
||||||
imgdata = zlib.compress(imgdata.tostring())
|
elif color in ("RGB", "L", "CMYK", "CMYK;I"):
|
||||||
|
debug_out("Colorspace is OK: %s"%color, verbose)
|
||||||
pdf.addimage(color, width, height, ndpi, imgformat, imgdata)
|
else:
|
||||||
|
debug_out("Converting colorspace %s to RGB"%color, verbose)
|
||||||
|
imgdata = imgdata.convert('RGB')
|
||||||
|
color = imgdata.mode
|
||||||
|
img = imgdata.tobytes()
|
||||||
|
# the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the close() method
|
||||||
|
try:
|
||||||
|
imgdata.close()
|
||||||
|
except AttributeError:
|
||||||
|
pass
|
||||||
|
imgdata = zlib.compress(img)
|
||||||
im.close()
|
im.close()
|
||||||
|
|
||||||
|
if pagesize_options and pagesize_options['exact'][1]:
|
||||||
|
# output size exactly to specified dimensions
|
||||||
|
# pagesize[0], pagesize[1] already checked in valid_size()
|
||||||
|
pdf_x, pdf_y = pagesize[0], pagesize[1]
|
||||||
|
else:
|
||||||
|
# output size based on dpi; point = 1/72 inch
|
||||||
|
pdf_x, pdf_y = 72.0*width/float(ndpi[0]), 72.0*height/float(ndpi[1])
|
||||||
|
|
||||||
|
pdf.addimage(color, width, height, imgformat, imgdata, pdf_x, pdf_y)
|
||||||
|
|
||||||
return pdf.tostring()
|
return pdf.tostring()
|
||||||
|
|
||||||
|
def get_ndpi(width, height, pagesize):
|
||||||
|
pagesize_options = pagesize[2]
|
||||||
|
|
||||||
|
if pagesize_options and pagesize_options['fill'][1]:
|
||||||
|
if width/height < pagesize[0]/pagesize[1]:
|
||||||
|
tmp_dpi = 72.0*width/pagesize[0]
|
||||||
|
else:
|
||||||
|
tmp_dpi = 72.0*height/pagesize[1]
|
||||||
|
elif pagesize[0] and pagesize[1]:
|
||||||
|
# if both height and width given with no specific pagesize_option,
|
||||||
|
# resize to fit "into" page
|
||||||
|
if width/height < pagesize[0]/pagesize[1]:
|
||||||
|
tmp_dpi = 72.0*height/pagesize[1]
|
||||||
|
else:
|
||||||
|
tmp_dpi = 72.0*width/pagesize[0]
|
||||||
|
elif pagesize[0]:
|
||||||
|
# if width given, calculate dpi based on width
|
||||||
|
tmp_dpi = 72.0*width/pagesize[0]
|
||||||
|
elif pagesize[1]:
|
||||||
|
# if height given, calculate dpi based on height
|
||||||
|
tmp_dpi = 72.0*height/pagesize[1]
|
||||||
|
else:
|
||||||
|
tmp_dpi = default_dpi
|
||||||
|
|
||||||
|
return tmp_dpi, tmp_dpi
|
||||||
|
|
||||||
def positive_float(string):
|
def positive_float(string):
|
||||||
value = float(string)
|
value = float(string)
|
||||||
|
@ -294,58 +380,276 @@ def positive_float(string):
|
||||||
return value
|
return value
|
||||||
|
|
||||||
def valid_date(string):
|
def valid_date(string):
|
||||||
return datetime.strptime(string, "%Y-%m-%dT%H:%M:%S")
|
# first try parsing in ISO8601 format
|
||||||
|
try:
|
||||||
|
return datetime.strptime(string, "%Y-%m-%d")
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
try:
|
||||||
|
return datetime.strptime(string, "%Y-%m-%dT%H:%M")
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
try:
|
||||||
|
return datetime.strptime(string, "%Y-%m-%dT%H:%M:%S")
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
# then try dateutil
|
||||||
|
try:
|
||||||
|
from dateutil import parser
|
||||||
|
except ImportError:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
return parser.parse(string)
|
||||||
|
except TypeError:
|
||||||
|
pass
|
||||||
|
# as a last resort, try the local date utility
|
||||||
|
try:
|
||||||
|
import subprocess
|
||||||
|
except ImportError:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
utime = subprocess.check_output(["date", "--date", string, "+%s"])
|
||||||
|
except subprocess.CalledProcessError:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
return datetime.utcfromtimestamp(int(utime))
|
||||||
|
raise argparse.ArgumentTypeError("cannot parse date: %s"%string)
|
||||||
|
|
||||||
|
def get_standard_papersize(string):
|
||||||
|
papersizes = {
|
||||||
|
"11x17" : "792x792^", # "792x1224",
|
||||||
|
"ledger" : "792x792^", # "1224x792",
|
||||||
|
"legal" : "612x612^", # "612x1008",
|
||||||
|
"letter" : "612x612^", # "612x792",
|
||||||
|
"arche" : "2592x2592^", # "2592x3456",
|
||||||
|
"archd" : "1728x1728^", # "1728x2592",
|
||||||
|
"archc" : "1296x1296^", # "1296x1728",
|
||||||
|
"archb" : "864x864^", # "864x1296",
|
||||||
|
"archa" : "648x648^", # "648x864",
|
||||||
|
"a0" : "2380x2380^", # "2380x3368",
|
||||||
|
"a1" : "1684x1684^", # "1684x2380",
|
||||||
|
"a2" : "1190x1190^", # "1190x1684",
|
||||||
|
"a3" : "842x842^", # "842x1190",
|
||||||
|
"a4" : "595x595^", # "595x842",
|
||||||
|
"a5" : "421x421^", # "421x595",
|
||||||
|
"a6" : "297x297^", # "297x421",
|
||||||
|
"a7" : "210x210^", # "210x297",
|
||||||
|
"a8" : "148x148^", # "148x210",
|
||||||
|
"a9" : "105x105^", # "105x148",
|
||||||
|
"a10" : "74x74^", # "74x105",
|
||||||
|
"b0" : "2836x2836^", # "2836x4008",
|
||||||
|
"b1" : "2004x2004^", # "2004x2836",
|
||||||
|
"b2" : "1418x1418^", # "1418x2004",
|
||||||
|
"b3" : "1002x1002^", # "1002x1418",
|
||||||
|
"b4" : "709x709^", # "709x1002",
|
||||||
|
"b5" : "501x501^", # "501x709",
|
||||||
|
"c0" : "2600x2600^", # "2600x3677",
|
||||||
|
"c1" : "1837x1837^", # "1837x2600",
|
||||||
|
"c2" : "1298x1298^", # "1298x1837",
|
||||||
|
"c3" : "918x918^", # "918x1298",
|
||||||
|
"c4" : "649x649^", # "649x918",
|
||||||
|
"c5" : "459x459^", # "459x649",
|
||||||
|
"c6" : "323x323^", # "323x459",
|
||||||
|
"flsa" : "612x612^", # "612x936",
|
||||||
|
"flse" : "612x612^", # "612x936",
|
||||||
|
"halfletter" : "396x396^", # "396x612",
|
||||||
|
"tabloid" : "792x792^", # "792x1224",
|
||||||
|
"statement" : "396x396^", # "396x612",
|
||||||
|
"executive" : "540x540^", # "540x720",
|
||||||
|
"folio" : "612x612^", # "612x936",
|
||||||
|
"quarto" : "610x610^", # "610x780"
|
||||||
|
}
|
||||||
|
|
||||||
|
string = string.lower()
|
||||||
|
return papersizes.get(string, string)
|
||||||
|
|
||||||
|
def valid_size(string):
|
||||||
|
# conversion factors from units to points
|
||||||
|
units = {
|
||||||
|
'in' : 72.0,
|
||||||
|
'cm' : 72.0/2.54,
|
||||||
|
'mm' : 72.0/25.4,
|
||||||
|
'pt' : 1.0
|
||||||
|
}
|
||||||
|
|
||||||
|
pagesize_options = {
|
||||||
|
'exact' : ['\!', False],
|
||||||
|
'shrink' : ['\>', False],
|
||||||
|
'enlarge' : ['\<', False],
|
||||||
|
'fill' : ['\^', False],
|
||||||
|
'percent' : ['\%', False],
|
||||||
|
'count' : ['\@', False],
|
||||||
|
}
|
||||||
|
|
||||||
|
string = get_standard_papersize(string)
|
||||||
|
|
||||||
|
pattern = re.compile(r"""
|
||||||
|
([0-9]*\.?[0-9]*) # tokens.group(1) == width; may be empty
|
||||||
|
([a-z]*) # tokens.group(2) == units; may be empty
|
||||||
|
x
|
||||||
|
([0-9]*\.?[0-9]*) # tokens.group(3) == height; may be empty
|
||||||
|
([a-zA-Z]*) # tokens.group(4) == units; may be empty
|
||||||
|
([^0-9a-zA-Z]*) # tokens.group(5) == extra options
|
||||||
|
""", re.VERBOSE)
|
||||||
|
|
||||||
|
tokens = pattern.match(string)
|
||||||
|
|
||||||
|
# tokens.group(0) should match entire input string
|
||||||
|
if tokens.group(0) != string:
|
||||||
|
msg = ('Input size needs to be of the format AuxBv#, '
|
||||||
|
'where A is width, B is height, u and v are units, '
|
||||||
|
'# are options. '
|
||||||
|
'You may omit either width or height, but not both. '
|
||||||
|
'Units may be specified as (in, cm, mm, pt). '
|
||||||
|
'You may omit units, which will default to pt. '
|
||||||
|
'Available options include (! = exact ; ^ = fill ; default = into).')
|
||||||
|
raise argparse.ArgumentTypeError(msg)
|
||||||
|
|
||||||
|
# temporary list to loop through to process width and height
|
||||||
|
pagesize_size = {
|
||||||
|
'x' : [0, tokens.group(1), tokens.group(2)],
|
||||||
|
'y' : [0, tokens.group(3), tokens.group(4)]
|
||||||
|
}
|
||||||
|
|
||||||
|
for key, value in pagesize_size.items():
|
||||||
|
try:
|
||||||
|
value[0] = float(value[1])
|
||||||
|
value[0] *= units[value[2]] # convert to points
|
||||||
|
except ValueError:
|
||||||
|
# assign None if width or height not provided
|
||||||
|
value[0] = None
|
||||||
|
except KeyError:
|
||||||
|
# if units unrecognized, raise error
|
||||||
|
# otherwise default to pt because units not provided
|
||||||
|
if value[2]:
|
||||||
|
msg = "unrecognized unit '%s'." % value[2]
|
||||||
|
raise argparse.ArgumentTypeError(msg)
|
||||||
|
|
||||||
|
x = pagesize_size['x'][0]
|
||||||
|
y = pagesize_size['y'][0]
|
||||||
|
|
||||||
|
# parse options for resize methods
|
||||||
|
if tokens.group(5):
|
||||||
|
for key, value in pagesize_options.items():
|
||||||
|
if re.search(value[0], tokens.group(5)):
|
||||||
|
value[1] = True
|
||||||
|
|
||||||
|
if pagesize_options['fill'][1]:
|
||||||
|
# if either width or height is not given, try to fill in missing value
|
||||||
|
if not x:
|
||||||
|
x = y
|
||||||
|
elif not y:
|
||||||
|
y = x
|
||||||
|
|
||||||
|
if pagesize_options['exact'][1]:
|
||||||
|
if not x or not y:
|
||||||
|
msg = ('exact size requires both width and height.')
|
||||||
|
raise argparse.ArgumentTypeError(msg)
|
||||||
|
|
||||||
|
if not x and not y:
|
||||||
|
msg = ('width and height cannot both be omitted.')
|
||||||
|
raise argparse.ArgumentTypeError(msg)
|
||||||
|
|
||||||
|
return (x, y, pagesize_options)
|
||||||
|
|
||||||
|
# in python3, the received argument will be a unicode str() object which needs
|
||||||
|
# to be encoded into a bytes() object
|
||||||
|
# in python2, the received argument will be a binary str() object which needs
|
||||||
|
# no encoding
|
||||||
|
# we check whether we use python2 or python3 by checking whether the argument
|
||||||
|
# is both, type str and type bytes (only the case in python2)
|
||||||
|
def pdf_embedded_string(string):
|
||||||
|
if type(string) is str and type(string) is not bytes:
|
||||||
|
# py3
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
# py2
|
||||||
|
string = string.decode("utf8")
|
||||||
|
string = b"\xfe\xff"+string.encode("utf-16-be")
|
||||||
|
string = string.replace(b'\\', b'\\\\')
|
||||||
|
string = string.replace(b'(', b'\\(')
|
||||||
|
string = string.replace(b')', b'\\)')
|
||||||
|
return string
|
||||||
|
|
||||||
parser = argparse.ArgumentParser(
|
parser = argparse.ArgumentParser(
|
||||||
description='Lossless conversion/embedding of images (in)to pdf')
|
description='Lossless conversion/embedding of images (in)to pdf')
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'images', metavar='infile', type=argparse.FileType('rb'),
|
'images', metavar='infile', type=str,
|
||||||
nargs='+', help='input file(s)')
|
nargs='+', help='input file(s)')
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'-o', '--output', metavar='out', type=argparse.FileType('wb'),
|
'-o', '--output', metavar='out', type=argparse.FileType('wb'),
|
||||||
default=sys.stdout, help='output file (default: stdout)')
|
default=getattr(sys.stdout, "buffer", sys.stdout),
|
||||||
parser.add_argument(
|
help='output file (default: stdout)')
|
||||||
|
|
||||||
|
sizeopts = parser.add_mutually_exclusive_group()
|
||||||
|
sizeopts.add_argument(
|
||||||
'-d', '--dpi', metavar='dpi', type=positive_float,
|
'-d', '--dpi', metavar='dpi', type=positive_float,
|
||||||
help='dpi for pdf output (default: 96.0)')
|
help=('dpi for pdf output. '
|
||||||
|
'If input image does not specify dpi the default is %.2f. '
|
||||||
|
'Must not be used with -s/--pagesize.') % default_dpi
|
||||||
|
)
|
||||||
|
|
||||||
|
sizeopts.add_argument(
|
||||||
|
'-s', '--pagesize', metavar='size', type=valid_size,
|
||||||
|
default=(None, None, None),
|
||||||
|
help=('size of the pdf pages in format AuxBv#, '
|
||||||
|
'where A is width, B is height, u and v are units, # are options. '
|
||||||
|
'You may omit either width or height, but not both. '
|
||||||
|
'Some common page sizes, such as letter and a4, are also recognized. '
|
||||||
|
'Units may be specified as (in, cm, mm, pt). '
|
||||||
|
'Units default to pt when absent. '
|
||||||
|
'Available options include (! = exact ; ^ = fill ; default = into). '
|
||||||
|
'Must not be used with -d/--dpi.')
|
||||||
|
)
|
||||||
|
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'-t', '--title', metavar='title', type=str,
|
'-t', '--title', metavar='title', type=pdf_embedded_string,
|
||||||
help='title for metadata')
|
help='title for metadata')
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'-a', '--author', metavar='author', type=str,
|
'-a', '--author', metavar='author', type=pdf_embedded_string,
|
||||||
help='author for metadata')
|
help='author for metadata')
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'-c', '--creator', metavar='creator', type=str,
|
'-c', '--creator', metavar='creator', type=pdf_embedded_string,
|
||||||
help='creator for metadata')
|
help='creator for metadata')
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'-p', '--producer', metavar='producer', type=str,
|
'-p', '--producer', metavar='producer', type=pdf_embedded_string,
|
||||||
help='producer for metadata')
|
help='producer for metadata')
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'-r', '--creationdate', metavar='creationdate', type=valid_date,
|
'-r', '--creationdate', metavar='creationdate', type=valid_date,
|
||||||
help='creation date for metadata in YYYY-MM-DDTHH:MM:SS format')
|
help='UTC creation date for metadata in YYYY-MM-DD or YYYY-MM-DDTHH:MM or YYYY-MM-DDTHH:MM:SS format or any format understood by python dateutil module or any format understood by `date --date`')
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'-m', '--moddate', metavar='moddate', type=valid_date,
|
'-m', '--moddate', metavar='moddate', type=valid_date,
|
||||||
help='modification date for metadata in YYYY-MM-DDTHH:MM:SS format')
|
help='UTC modification date for metadata in YYYY-MM-DD or YYYY-MM-DDTHH:MM or YYYY-MM-DDTHH:MM:SS format or any format understood by python dateutil module or any format understood by `date --date`')
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'-s', '--subject', metavar='subject', type=str,
|
'-S', '--subject', metavar='subject', type=pdf_embedded_string,
|
||||||
help='subject for metadata')
|
help='subject for metadata')
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'-k', '--keywords', metavar='kw', type=str, nargs='+',
|
'-k', '--keywords', metavar='kw', type=pdf_embedded_string, nargs='+',
|
||||||
help='keywords for metadata')
|
help='keywords for metadata')
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'-C', '--colorspace', metavar='colorspace', type=str,
|
'-C', '--colorspace', metavar='colorspace', type=pdf_embedded_string,
|
||||||
help='force PIL colorspace (one of: RGB, L, 1)')
|
help='force PIL colorspace (one of: RGB, L, 1, CMYK, CMYK;I)')
|
||||||
|
parser.add_argument(
|
||||||
|
'-D', '--nodate', help='do not add timestamps', action="store_true")
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'-v', '--verbose', help='verbose mode', action="store_true")
|
'-v', '--verbose', help='verbose mode', action="store_true")
|
||||||
|
parser.add_argument(
|
||||||
|
'-V', '--version', action='version', version='%(prog)s '+__version__,
|
||||||
|
help="Print version information and exit")
|
||||||
|
|
||||||
def main(args=None):
|
def main(args=None):
|
||||||
if args is None:
|
if args is None:
|
||||||
args = sys.argv[1:]
|
args = sys.argv[1:]
|
||||||
args = parser.parse_args(args)
|
args = parser.parse_args(args)
|
||||||
|
|
||||||
args.output.write(
|
args.output.write(
|
||||||
convert(
|
convert(
|
||||||
args.images, args.dpi, args.title, args.author,
|
args.images, args.dpi, args.pagesize, args.title, args.author,
|
||||||
args.creator, args.producer, args.creationdate, args.moddate,
|
args.creator, args.producer, args.creationdate, args.moddate,
|
||||||
args.subject, args.keywords, args.colorspace, args.verbose))
|
args.subject, args.keywords, args.colorspace, args.nodate,
|
||||||
|
args.verbose))
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
main()
|
main()
|
||||||
|
|
|
@ -85,6 +85,6 @@ def parsejp2(data):
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
import sys
|
import sys
|
||||||
width, height, colorspace = parsejp2(open(sys.argv[1]).read())
|
width, height, colorspace = parsejp2(open(sys.argv[1]).read())
|
||||||
print "width = %d"%width
|
sys.stdout.write("width = %d"%width)
|
||||||
print "height = %d"%height
|
sys.stdout.write("height = %d"%height)
|
||||||
print "colorspace = %s"%colorspace
|
sys.stdout.write("colorspace = %s"%colorspace)
|
||||||
|
|
|
@ -1,7 +1,109 @@
|
||||||
import unittest
|
import unittest
|
||||||
import test_img2pdf
|
|
||||||
|
import os
|
||||||
|
import img2pdf
|
||||||
|
import zlib
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
HERE = os.path.dirname(__file__)
|
||||||
|
|
||||||
|
#convert +set date:create +set date:modify -define png:exclude-chunk=time
|
||||||
|
|
||||||
def test_suite():
|
def test_suite():
|
||||||
|
class TestImg2Pdf(unittest.TestCase):
|
||||||
|
pass
|
||||||
|
|
||||||
|
for test_name in os.listdir(os.path.join(HERE, "input")):
|
||||||
|
inputf = os.path.join(HERE, "input", test_name)
|
||||||
|
if not os.path.isfile(inputf):
|
||||||
|
continue
|
||||||
|
outputf = os.path.join(HERE, "output", test_name+".pdf")
|
||||||
|
assert os.path.isfile(outputf)
|
||||||
|
def handle(self, f=inputf, out=outputf):
|
||||||
|
with open(f, "rb") as inf:
|
||||||
|
orig_imgdata = inf.read()
|
||||||
|
pdf = img2pdf.convert([f], nodate=True)
|
||||||
|
imgdata = b""
|
||||||
|
instream = False
|
||||||
|
imgobj = False
|
||||||
|
colorspace = None
|
||||||
|
imgfilter = None
|
||||||
|
width = None
|
||||||
|
height = None
|
||||||
|
length = None
|
||||||
|
# ugly workaround to parse the created pdf
|
||||||
|
for line in pdf.split(b'\n'):
|
||||||
|
if instream:
|
||||||
|
if line == b"endstream":
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
imgdata += line + b'\n'
|
||||||
|
else:
|
||||||
|
if imgobj and line == b"stream":
|
||||||
|
instream = True
|
||||||
|
elif b"/Subtype /Image" in line:
|
||||||
|
imgobj = True
|
||||||
|
elif b"/Width" in line:
|
||||||
|
width = int(line.split()[-1])
|
||||||
|
elif b"/Height" in line:
|
||||||
|
height = int(line.split()[-1])
|
||||||
|
elif b"/Length" in line:
|
||||||
|
length = int(line.split()[-1])
|
||||||
|
elif b"/Filter" in line:
|
||||||
|
imgfilter = line.split()[-2]
|
||||||
|
elif b"/ColorSpace" in line:
|
||||||
|
colorspace = line.split()[-1]
|
||||||
|
# remove trailing \n
|
||||||
|
imgdata = imgdata[:-1]
|
||||||
|
# test if the length field is correct
|
||||||
|
self.assertEqual(len(imgdata), length)
|
||||||
|
# test if the filter is valid:
|
||||||
|
self.assertIn(imgfilter, [b"/DCTDecode", b"/JPXDecode", b"/FlateDecode"])
|
||||||
|
# test if the colorspace is valid
|
||||||
|
self.assertIn(colorspace, [b"/DeviceGray", b"/DeviceRGB", b"/DeviceCMYK"])
|
||||||
|
# test if the image has correct size
|
||||||
|
orig_img = Image.open(f)
|
||||||
|
self.assertEqual(width, orig_img.size[0])
|
||||||
|
self.assertEqual(height, orig_img.size[1])
|
||||||
|
# if the input file is a jpeg then it should've been copied
|
||||||
|
# verbatim into the PDF
|
||||||
|
if imgfilter in [b"/DCTDecode", b"/JPXDecode"]:
|
||||||
|
self.assertEqual(imgdata, orig_imgdata)
|
||||||
|
elif imgfilter == b"/FlateDecode":
|
||||||
|
# otherwise, the data is flate encoded and has to be equal to
|
||||||
|
# the pixel data of the input image
|
||||||
|
imgdata = zlib.decompress(imgdata)
|
||||||
|
if colorspace == b"/DeviceGray":
|
||||||
|
colorspace = 'L'
|
||||||
|
elif colorspace == b"/DeviceRGB":
|
||||||
|
colorspace = 'RGB'
|
||||||
|
elif colorspace == b"/DeviceCMYK":
|
||||||
|
colorspace = 'CMYK'
|
||||||
|
else:
|
||||||
|
raise Exception("invalid colorspace")
|
||||||
|
im = Image.frombytes(colorspace, (width, height), imgdata)
|
||||||
|
if orig_img.mode == '1':
|
||||||
|
orig_img = orig_img.convert("L")
|
||||||
|
elif orig_img.mode not in ("RGB", "L", "CMYK", "CMYK;I"):
|
||||||
|
orig_img = orig_img.convert("RGB")
|
||||||
|
self.assertEqual(im.tobytes(), orig_img.tobytes())
|
||||||
|
# the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the close() method
|
||||||
|
try:
|
||||||
|
im.close()
|
||||||
|
except AttributeError:
|
||||||
|
pass
|
||||||
|
# lastly, make sure that the generated pdf matches bit by bit the
|
||||||
|
# expected pdf
|
||||||
|
with open(out, "rb") as outf:
|
||||||
|
out = outf.read()
|
||||||
|
self.assertEqual(pdf, out)
|
||||||
|
# the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the close() method
|
||||||
|
try:
|
||||||
|
orig_img.close()
|
||||||
|
except AttributeError:
|
||||||
|
pass
|
||||||
|
setattr(TestImg2Pdf, "test_%s"%test_name, handle)
|
||||||
|
|
||||||
return unittest.TestSuite((
|
return unittest.TestSuite((
|
||||||
unittest.makeSuite(test_img2pdf.TestImg2Pdf),
|
unittest.makeSuite(TestImg2Pdf),
|
||||||
))
|
))
|
||||||
|
|
BIN
src/tests/input/CMYK.jpg
Normal file
BIN
src/tests/input/CMYK.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 4.7 KiB |
BIN
src/tests/input/CMYK.tif
Normal file
BIN
src/tests/input/CMYK.tif
Normal file
Binary file not shown.
Before Width: | Height: | Size: 2.3 KiB After Width: | Height: | Size: 2.3 KiB |
Before Width: | Height: | Size: 1.1 KiB After Width: | Height: | Size: 1.1 KiB |
BIN
src/tests/output/CMYK.jpg.pdf
Normal file
BIN
src/tests/output/CMYK.jpg.pdf
Normal file
Binary file not shown.
BIN
src/tests/output/CMYK.tif.pdf
Normal file
BIN
src/tests/output/CMYK.tif.pdf
Normal file
Binary file not shown.
Binary file not shown.
BIN
src/tests/output/normal.png.pdf
Normal file
BIN
src/tests/output/normal.png.pdf
Normal file
Binary file not shown.
|
@ -1,20 +0,0 @@
|
||||||
import datetime
|
|
||||||
import os
|
|
||||||
import unittest
|
|
||||||
import img2pdf
|
|
||||||
|
|
||||||
HERE = os.path.dirname(__file__)
|
|
||||||
moddate = datetime.datetime(2014, 1, 1)
|
|
||||||
|
|
||||||
class TestImg2Pdf(unittest.TestCase):
|
|
||||||
def test_jpg2pdf(self):
|
|
||||||
with open(os.path.join(HERE, 'test.jpg'), 'r') as img_fp:
|
|
||||||
with open(os.path.join(HERE, 'test.pdf'), 'r') as pdf_fp:
|
|
||||||
self.assertEqual(
|
|
||||||
img2pdf.convert([img_fp], 150,
|
|
||||||
creationdate=moddate, moddate=moddate),
|
|
||||||
pdf_fp.read())
|
|
||||||
|
|
||||||
def test_png2pdf(self):
|
|
||||||
with open(os.path.join(HERE, 'test.png'), 'r') as img_fp:
|
|
||||||
self.assertRaises(SystemExit, img2pdf.convert, [img_fp], 150)
|
|
Loading…
Reference in a new issue