diff --git a/CHANGES.rst b/CHANGES.rst index ec2a745..e6aa986 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -2,11 +2,38 @@ CHANGES ======= -0.1.6 +0.2.0 ----- - - replace -x and -y option by combined option -s (or --pagesize) and use -S - for --subject + - now Python3 only + - pep8 compliant code + - update my email to josch@mister-muffin.de + - move from github to gitlab.mister-muffin.de/josch/img2pdf + - use logging module + - add extensive test suite + - ability to read from standard input + - pdf writer: + - make more compatible with the interface of pdfrw module + - print floats which equal to their integer conversion as integer + - do not print trailing zeroes for floating point numbers + - print more linebreaks + - add binary string at beginning of PDF to indicate that the PDF + contains binary data + - handle datetime and unicode strings by using utf-16-be encoding + - new options (see --help for more details): + - --without-pdfrw + - --imgsize + - --border + - --fit + - --auto-orient + - --viewer-panes + - --viewer-initial-page + - --viewer-magnification + - --viewer-page-layout + - --viewer-fit-window + - --viewer-center-window + - --viewer-fullscreen + - remove short options for metadata command line arguments - correctly encode and escape non-ascii metadata - explicitly store date in UTC and allow parsing all date formats understood by dateutil and `date --date` @@ -42,9 +69,6 @@ CHANGES ------------------ - Initial PyPI release. - - Modified code to create proper package. - - Added tests. - - Added console script entry point. diff --git a/README.md b/README.md index 476c2e9..27637d6 100644 --- a/README.md +++ b/README.md @@ -1,31 +1,35 @@ img2pdf ======= -Losslessly convert images to PDF without unnecessarily re-encoding JPEG and -JPEG2000 files. Image quality is retained without unnecessarily increasing -file size. +Losslessly convert raster images to PDF. The file size will not unnecessarily +increase. One major application would be a number of scans made in JPEG format +which should now become part of a single PDF document. Existing solutions +would either re-encode the input JPEG files (leading to quality loss) or store +them in the zip/flate format which results into the PDF becoming unnecessarily +large in terms of its file size. Background ---------- -Quality loss can be avoided when converting JPEG and JPEG2000 images to -PDF by embedding them without re-encoding. I wrote this piece of python code. -because I was missing a tool to do this automatically. +Quality loss can be avoided when converting JPEG and JPEG2000 images to PDF by +embedding them without re-encoding. I wrote this piece of python code. +because I was missing a tool to do this automatically. Img2pdf basically just +wraps JPEG images into the PDF container as they are. -If you know how to embed JPEG and JPEG2000 images into a PDF container without -recompression, using existing tools, please contact me so that I can put this -code into the garbage bin :D +If you know an existing tool which allows one to embed JPEG and JPEG2000 images +into a PDF container without recompression, please contact me so that I can put +this code into the garbage bin. Functionality ------------- This program will take a list of images and produce a PDF file with the images embedded in it. JPEG and JPEG2000 images will be included without -recompression. Images in other formats will be included with zip/flate +recompression. Raster images in other formats will be included with zip/flate encoding which usually leads to an increase in the resulting size because formats like png compress better than PDF which just zip/flate compresses the RGB data. As a result, this tool is able to losslessly wrap images into a PDF -container with a quality-filesize ratio that is typically better (in case of +container with a quality to filesize ratio that is typically better (in case of JPEG and JPEG2000 images) or equal (in case of other formats) than that of existing tools. @@ -50,75 +54,26 @@ than the input JPEG or JPEG2000 file. img2pdf is able to losslessly embed JPEG and JPEG2000 files into a PDF container without additional overhead (aside from the PDF structure itself), -save other graphics formats using lossless zip compression, -and produce multi-page PDF files when more than one input image is given. - -Also, since JPEG and JPEG2000 images are not reencoded, conversion with -img2pdf is several times faster than with other tools. +save other graphics formats using lossless zip compression, and produce +multi-page PDF files when more than one input image is given. +Also, since JPEG and JPEG2000 images are not reencoded, conversion with img2pdf +is several times faster than with other tools. Usage ----- -#### General Notes +The images must be provided as files because img2pdf needs to seek in the file +descriptor. -The images must be provided as files because img2pdf needs to seek -in the file descriptor. Input cannot be piped through stdin. +If no output file is specified with the `-o`/`--output` option, output will be +done to stdout. -If no output file is specified with the `-o`/`--output` option, -output will be to stdout. - -Descriptions of the options should be self explanatory. -They are available by running: +The detailed documentation can be accessed by running: img2pdf --help -#### Controlling Page Size - -The PDF page size can be manipulated. By default, the image will be sized "into" the given dimensions with the aspect ratio retained. For instance, to size an image into a page that is at most 500pt x 500pt, use: - - img2pdf -s 500x500 -o output.pdf input.jpg - -To "fill" out a page that is at least 500pt x 500pt, follow the dimensions with a `^`: - - img2pdf -s 500x500^ -o output.pdf input.jpg - -To output pages that are exactly 500pt x 500pt, follow the dimensions with an `!`: - - img2pdf -s 500x500\! -o output.pdf input.jpg - -Notice that the default unit is points. Units may be also be specified and mixed: - - img2pdf -s 8.5inx27.94cm -o output.pdf input.jpg - -If either width or height is omitted, the other will be calculated -to preserve aspect ratio. - - img2pdf -s x280mm -o output1.pdf input.jpg - img2pdf -s 280mmx -o output2.pdf input.jpg - -Some standard page sizes are recognized: - - img2pdf -s letter -o output1.pdf input.jpg - img2pdf -s a4 -o output2.pdf input.jpg - -#### Colorspace - -Currently, the colorspace must be forced for JPEG 2000 images that are -not in the RGB colorspace. Available colorspace options are based on -Python Imaging Library (PIL) short handles. - - * `RGB` = RGB color - * `L` = Grayscale - * `1` = Black and white (internally converted to grayscale) - * `CMYK` = CMYK color - * `CMYK;I` = CMYK color with inversion - -For example, to encode a grayscale JPEG2000 image, use: - - img2pdf -C L -o output.pdf input.jp2 - Bugs ---- @@ -135,20 +90,17 @@ outperformed by another lossless compression method, contact me. I have not yet figured out how to determine the colorspace of JPEG2000 files. Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 files with -other colorspaces, you must force it using the `--colorspace` option. +other colorspaces, you must explicitly specify it using the `--colorspace` +option. It might be possible to store transparency using masks but it is not clear what the utility of such a functionality would be. Most vector graphic formats can be losslessly turned into PDF (minus some of the features unsupported by PDF) but img2pdf will currently turn vector -graphics into their lossy raster representations. - -Acrobat is able to store a hint for the PDF reader of how to present the PDF -when opening it. Things like automatic fullscreen or the zoom level can be -configured. - -It would be nice if a single input image could be read from standard input. +graphics into their lossy raster representations. For converting raster +graphics to PDF, use another tool like inkscape and then join the resulting +pages with a tool like pdftk. A configuration file could be used for default options. @@ -158,10 +110,6 @@ Installation On a Debian- and Ubuntu-based systems, dependencies may be installed with the following command: - apt-get install python python-pil python-setuptools - -Or for Python 3: - apt-get install python3 python3-pil python3-setuptools You can then install the package using: @@ -187,7 +135,7 @@ You can then test the converter using: The package can also be used as a library: import img2pdf - pdf_bytes = img2pdf.convert(['test.jpg']) + pdf_bytes = img2pdf.convert('test.jpg') file = open("name.pdf","wb") file.write(pdf_bytes) diff --git a/setup.py b/setup.py index 1ad815c..be3dbba 100644 --- a/setup.py +++ b/setup.py @@ -1,37 +1,37 @@ from setuptools import setup -VERSION="0.1.6~git" +VERSION = "0.2" -setup ( +setup( name='img2pdf', version=VERSION, - author = "Johannes 'josch' Schauer", - author_email = 'j.schauer@email.de', - description = "Convert images to PDF via direct JPEG inclusion.", - long_description = open('README.md').read(), - license = "LGPL", - keywords = "jpeg pdf converter", - classifiers = [ - 'Development Status :: 4 - Beta', + author="Johannes 'josch' Schauer", + author_email='josch@mister-muffin.de', + description="Convert images to PDF via direct JPEG inclusion.", + long_description=open('README.md').read(), + license="LGPL", + keywords="jpeg pdf converter", + classifiers=[ + 'Development Status :: 5 - Production/Stable', 'Intended Audience :: Developers', + 'Intended Audience :: Other Audience', + 'Environment :: Console', 'Programming Language :: Python', - 'Programming Language :: Python :: 2', - 'Programming Language :: Python :: 2.6', - 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: Implementation :: CPython', - 'License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)', - 'Programming Language :: Python', + 'License :: OSI Approved :: GNU Lesser General Public License v3 ' + '(LGPLv3)', 'Natural Language :: English', 'Operating System :: OS Independent'], - url = 'https://github.com/josch/img2pdf', - download_url = 'https://github.com/josch/img2pdf/archive/'+VERSION+'.tar.gz', + url='https://gitlab.mister-muffin.de/josch/img2pdf', + download_url='https://gitlab.mister-muffin.de/josch/img2pdf/repository/' + 'archive.tar.gz?ref=' + VERSION, package_dir={"": "src"}, py_modules=['img2pdf', 'jp2'], - include_package_data = True, - test_suite = 'tests.test_suite', - zip_safe = True, + include_package_data=True, + test_suite='tests.test_suite', + zip_safe=True, install_requires=( 'Pillow', ), diff --git a/src/img2pdf.py b/src/img2pdf.py index 0293e7b..b909840 100755 --- a/src/img2pdf.py +++ b/src/img2pdf.py @@ -1,4 +1,4 @@ -#!/usr/bin/env python2 +#!/usr/bin/env python3 # Copyright (C) 2012-2014 Johannes 'josch' Schauer # @@ -17,34 +17,70 @@ # License along with this program. If not, see # . -__version__ = "0.1.6~git" -default_dpi = 96.0 - -import re import sys import zlib import argparse from PIL import Image from datetime import datetime from jp2 import parsejp2 -try: - from cStringIO import cStringIO -except ImportError: - from io import BytesIO as cStringIO +from enum import Enum +from io import BytesIO +import logging -# XXX: Switch to use logging module. -def debug_out(message, verbose=True): - if verbose: - sys.stderr.write("D: "+message+"\n") +__version__ = "0.2" +default_dpi = 96.0 +papersizes = { + "letter": "8.5inx11in", + "a0": "841mmx1189mm", + "a1": "594mmx841mm", + "a2": "420mmx594mm", + "a3": "297mmx420mm", + "a4": "210mmx297mm", + "a5": "148mmx210mm", + "a6": "105mmx148mm", +} +papernames = { + "letter": "Letter", + "a0": "A0", + "a1": "A1", + "a2": "A2", + "a3": "A3", + "a4": "A4", + "a5": "A5", + "a6": "A6", +} -def error_out(message): - sys.stderr.write("E: "+message+"\n") -def warning_out(message): - sys.stderr.write("W: "+message+"\n") +FitMode = Enum('FitMode', 'into fill exact shrink enlarge') + +PageOrientation = Enum('PageOrientation', 'portrait landscape') + +Colorspace = Enum('Colorspace', 'RGB L 1 CMYK CMYK;I RGBA') + +ImageFormat = Enum('ImageFormat', 'JPEG JPEG2000 TIFF PNG GIF') + +PageMode = Enum('PageMode', 'none outlines thumbs') + +PageLayout = Enum('PageLayout', + 'single onecolumn twocolumnright twocolumnleft') + +Magnification = Enum('Magnification', 'fit fith fitbh') + +ImgSize = Enum('ImgSize', 'abs perc dpi') + +Unit = Enum('Unit', 'pt cm mm inch') + +ImgUnit = Enum('ImgUnit', 'pt cm mm inch perc dpi') + + +class NegativeDimensionException(Exception): + pass + + +# without pdfrw this function is a no-op +def my_convert_load(string): + return string -def datetime_to_pdfdate(dt): - return dt.strftime("%Y%m%d%H%M%SZ") def parse(cont, indent=1): if type(cont) is dict: @@ -54,323 +90,988 @@ def parse(cont, indent=1): elif type(cont) is int: return str(cont).encode() elif type(cont) is float: - return ("%0.4f"%cont).encode() - elif isinstance(cont, obj): - return ("%d 0 R"%cont.identifier).encode() - elif type(cont) is str or type(cont) is bytes: + if int(cont) == cont: + return parse(int(cont)) + else: + return ("%0.4f" % cont).rstrip("0").encode() + elif isinstance(cont, MyPdfDict): + # if cont got an identifier, then adobj() has been called with it + # and a link to it will be added, otherwise add it inline + if hasattr(cont, "identifier"): + return ("%d 0 R" % cont.identifier).encode() + else: + return parse(cont.content, indent) + elif type(cont) is str or isinstance(cont, bytes): if type(cont) is str and type(cont) is not bytes: - raise Exception("parse must be passed a bytes object in py3") + raise Exception( + "parse must be passed a bytes object in py3. Got: %s" % cont) return cont - elif type(cont) is list: + elif isinstance(cont, list): return b"[ "+b" ".join([parse(c, indent) for c in cont])+b" ]" else: - raise Exception("cannot handle type %s"%type(cont)) + raise Exception("cannot handle type %s with content %s" % (type(cont), + cont)) -class obj(object): - def __init__(self, content, stream=None): - self.content = content - self.stream = stream + +class MyPdfDict(object): + def __init__(self, *args, **kw): + self.content = dict() + if args: + if len(args) == 1: + args = args[0] + self.content.update(args) + self.stream = None + for key, value in kw.items(): + if key == "stream": + self.stream = value + self.content[MyPdfName.Length] = len(value) + elif key == "indirect": + pass + else: + self.content[getattr(MyPdfName, key)] = value def tostring(self): - if self.stream: + if self.stream is not None: return ( - ("%d 0 obj " % self.identifier).encode() + + ("%d 0 obj\n" % self.identifier).encode() + parse(self.content) + b"\nstream\n" + self.stream + b"\nendstream\nendobj\n") else: - return ("%d 0 obj "%self.identifier).encode()+parse(self.content)+b" endobj\n" + return ("%d 0 obj\n" % self.identifier).encode() + \ + parse(self.content) + b"\nendobj\n" -class pdfdoc(object): + def __setitem__(self, key, value): + self.content[key] = value - def __init__(self, version=3, title=None, author=None, creator=None, - producer=None, creationdate=None, moddate=None, subject=None, - keywords=None, nodate=False): - self.version = version # default pdf version 1.3 - now = datetime.now() + def __getitem__(self, key): + return self.content[key] + + +class MyPdfName(): + def __getattr__(self, name): + return b'/' + name.encode('ascii') +MyPdfName = MyPdfName() + + +class MyPdfObject(bytes): + def __new__(cls, string): + return bytes.__new__(cls, string.encode('ascii')) + + +class MyPdfArray(list): + pass + + +class MyPdfWriter(): + def __init__(self, version="1.3"): self.objects = [] - - info = {} - if title: - info[b"/Title"] = b"("+title+b")" - if author: - info[b"/Author"] = b"("+author+b")" - if creator: - info[b"/Creator"] = b"("+creator+b")" - if producer: - info[b"/Producer"] = b"("+producer+b")" - if creationdate: - info[b"/CreationDate"] = b"(D:"+datetime_to_pdfdate(creationdate).encode()+b")" - elif not nodate: - info[b"/CreationDate"] = b"(D:"+datetime_to_pdfdate(now).encode()+b")" - if moddate: - info[b"/ModDate"] = b"(D:"+datetime_to_pdfdate(moddate).encode()+b")" - elif not nodate: - info[b"/ModDate"] = b"(D:"+datetime_to_pdfdate(now).encode()+b")" - if subject: - info[b"/Subject"] = b"("+subject+b")" - if keywords: - info[b"/Keywords"] = b"("+b",".join(keywords)+b")" - - self.info = obj(info) - # create an incomplete pages object so that a /Parent entry can be # added to each page - self.pages = obj({ - b"/Type": b"/Pages", - b"/Kids": [], - b"/Count": 0 - }) - - self.catalog = obj({ - b"/Pages": self.pages, - b"/Type": b"/Catalog" - }) - self.addobj(self.catalog) - self.addobj(self.pages) + self.pages = MyPdfDict(Type=MyPdfName.Pages, Kids=[], Count=0) + self.catalog = MyPdfDict(Pages=self.pages, Type=MyPdfName.Catalog) + self.version = version # default pdf version 1.3 + self.pagearray = [] def addobj(self, obj): newid = len(self.objects)+1 obj.identifier = newid self.objects.append(obj) - def addimage(self, color, width, height, imgformat, imgdata, pdf_x, pdf_y): - if color == 'L': - colorspace = b"/DeviceGray" - elif color == 'RGB': - colorspace = b"/DeviceRGB" - elif color == 'CMYK' or color == 'CMYK;I': - colorspace = b"/DeviceCMYK" - else: - error_out("unsupported color space: %s"%color) - exit(1) - - if pdf_x < 3.00 or pdf_y < 3.00: - warning_out("pdf width or height is below 3.00 - decrease the dpi") - - # either embed the whole jpeg or deflate the bitmap representation - if imgformat is "JPEG": - ofilter = [ b"/DCTDecode" ] - elif imgformat is "JPEG2000": - ofilter = [ b"/JPXDecode" ] - self.version = 5 # jpeg2000 needs pdf 1.5 - else: - ofilter = [ b"/FlateDecode" ] - image = obj({ - b"/Type": b"/XObject", - b"/Subtype": b"/Image", - b"/Filter": ofilter, - b"/Width": width, - b"/Height": height, - b"/ColorSpace": colorspace, - # hardcoded as PIL doesn't provide bits for non-jpeg formats - b"/BitsPerComponent": 8, - b"/Length": len(imgdata) - }, imgdata) - - if color == 'CMYK;I': - # Inverts all four channels - image.content[b'/Decode'] = [1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0] - - text = ("q\n%0.4f 0 0 %0.4f 0 0 cm\n/Im0 Do\nQ"%(pdf_x, pdf_y)).encode() - - content = obj({ - b"/Length": len(text) - }, text) - - page = obj({ - b"/Type": b"/Page", - b"/Parent": self.pages, - b"/Resources": { - b"/XObject": { - b"/Im0": image - } - }, - b"/MediaBox": [0, 0, pdf_x, pdf_y], - b"/Contents": content - }) - self.pages.content[b"/Kids"].append(page) - self.pages.content[b"/Count"] += 1 - self.addobj(page) - self.addobj(content) - self.addobj(image) - - def tostring(self): - # add info as last object - self.addobj(self.info) - + def getstring(self, info): xreftable = list() - result = ("%%PDF-1.%d\n"%self.version).encode() + # justification of the random binary garbage in the header from + # adobe: + # + # > Note: If a PDF file contains binary data, as most do (see Section + # > 3.1, “Lexical Conventions”), it is recommended that the header + # > line be immediately followed by a comment line containing at + # > least four binary characters—that is, characters whose codes are + # > 128 or greater. This ensures proper behavior of file transfer + # > applications that inspect data near the beginning of a file to + # > determine whether to treat the file’s contents as text or as + # > binary. + # + # the choice of binary characters is arbitrary but those four seem to + # be used elsewhere. + result = ('%%PDF-%s\n' % self.version).encode('ascii') + result += b'%\xe2\xe3\xcf\xd3\n' + # From section 3.4.3 of the PDF Reference (version 1.7): + # + # > Each entry is exactly 20 bytes long, including the end-of-line + # > marker. + # > + # > [...] + # > + # > The format of an in-use entry is + # > nnnnnnnnnn ggggg n eol + # > where + # > nnnnnnnnnn is a 10-digit byte offset + # > ggggg is a 5-digit generation number + # > n is a literal keyword identifying this as an in-use entry + # > eol is a 2-character end-of-line sequence + # > + # > [...] + # > + # > If the file’s end-of-line marker is a single character (either a + # > carriage return or a line feed), it is preceded by a single space; + # + # Since we chose to use a single character eol marker, we preceed it by + # a space xreftable.append(b"0000000000 65535 f \n") for o in self.objects: - xreftable.append(("%010d 00000 n \n"%len(result)).encode()) + xreftable.append(("%010d 00000 n \n" % len(result)).encode()) result += o.tostring() xrefoffset = len(result) result += b"xref\n" - result += ("0 %d\n"%len(xreftable)).encode() + result += ("0 %d\n" % len(xreftable)).encode() for x in xreftable: result += x result += b"trailer\n" - result += parse({b"/Size": len(xreftable), b"/Info": self.info, b"/Root": self.catalog})+b"\n" + result += parse({b"/Size": len(xreftable), b"/Info": info, + b"/Root": self.catalog})+b"\n" result += b"startxref\n" - result += ("%d\n"%xrefoffset).encode() + result += ("%d\n" % xrefoffset).encode() result += b"%%EOF\n" return result -def convert(images, dpi=None, pagesize=(None, None, None), title=None, + def addpage(self, page): + page[b"/Parent"] = self.pages + self.pagearray.append(page) + self.pages.content[b"/Kids"].append(page) + self.pages.content[b"/Count"] += 1 + self.addobj(page) + + +class MyPdfString(): + @classmethod + def encode(cls, string): + try: + string = string.encode('ascii') + except UnicodeEncodeError: + string = b"\xfe\xff"+string.encode("utf-16-be") + string = string.replace(b'\\', b'\\\\') + string = string.replace(b'(', b'\\(') + string = string.replace(b')', b'\\)') + return b'(' + string + b')' + + +class pdfdoc(object): + def __init__(self, version="1.3", title=None, author=None, creator=None, + producer=None, creationdate=None, moddate=None, subject=None, + keywords=None, nodate=False, panes=None, initial_page=None, + magnification=None, page_layout=None, fit_window=False, + center_window=False, fullscreen=False, with_pdfrw=True): + if with_pdfrw: + try: + from pdfrw import PdfWriter, PdfDict, PdfName, PdfString + self.with_pdfrw = True + except ImportError: + PdfWriter = MyPdfWriter + PdfDict = MyPdfDict + PdfName = MyPdfName + PdfString = MyPdfString + self.with_pdfrw = False + else: + PdfWriter = MyPdfWriter + PdfDict = MyPdfDict + PdfName = MyPdfName + PdfString = MyPdfString + self.with_pdfrw = False + + now = datetime.now() + self.info = PdfDict(indirect=True) + + def datetime_to_pdfdate(dt): + return dt.strftime("%Y%m%d%H%M%SZ") + + if title is not None: + self.info[PdfName.Title] = PdfString.encode(title) + if author is not None: + self.info[PdfName.Author] = PdfString.encode(author) + if creator is not None: + self.info[PdfName.Creator] = PdfString.encode(creator) + if producer is not None: + self.info[PdfName.Producer] = PdfString.encode(producer) + if creationdate is not None: + self.info[PdfName.CreationDate] = \ + PdfString.encode("D:"+datetime_to_pdfdate(creationdate)) + elif not nodate: + self.info[PdfName.CreationDate] = \ + PdfString.encode("D:"+datetime_to_pdfdate(now)) + if moddate is not None: + self.info[PdfName.ModDate] = \ + PdfString.encode("D:"+datetime_to_pdfdate(moddate)) + elif not nodate: + self.info[PdfName.ModDate] = PdfString.encode( + "D:"+datetime_to_pdfdate(now)) + if subject is not None: + self.info[PdfName.Subject] = PdfString.encode(subject) + if keywords is not None: + self.info[PdfName.Keywords] = PdfString.encode(",".join(keywords)) + + self.writer = PdfWriter() + self.writer.version = version + # this is done because pdfrw adds info, catalog and pages as the first + # three objects in this order + if not self.with_pdfrw: + self.writer.addobj(self.info) + self.writer.addobj(self.writer.catalog) + self.writer.addobj(self.writer.pages) + + self.panes = panes + self.initial_page = initial_page + self.magnification = magnification + self.page_layout = page_layout + self.fit_window = fit_window + self.center_window = center_window + self.fullscreen = fullscreen + + def add_imagepage(self, color, imgwidthpx, imgheightpx, imgformat, imgdata, + imgwidthpdf, imgheightpdf, imgxpdf, imgypdf, pagewidth, + pageheight): + if self.with_pdfrw: + from pdfrw import PdfDict, PdfName + from pdfrw.py23_diffs import convert_load + else: + PdfDict = MyPdfDict + PdfName = MyPdfName + convert_load = my_convert_load + + if color == Colorspace.L: + colorspace = PdfName.DeviceGray + elif color == Colorspace.RGB: + colorspace = PdfName.DeviceRGB + elif color == Colorspace.CMYK or color == Colorspace['CMYK;I']: + colorspace = PdfName.DeviceCMYK + else: + logging.error("unsupported color space: %s", color.name) + exit(1) + + # either embed the whole jpeg or deflate the bitmap representation + if imgformat is ImageFormat.JPEG: + ofilter = [PdfName.DCTDecode] + elif imgformat is ImageFormat.JPEG2000: + ofilter = [PdfName.JPXDecode] + self.writer.version = "1.5" # jpeg2000 needs pdf 1.5 + else: + ofilter = [PdfName.FlateDecode] + + image = PdfDict(stream=convert_load(imgdata)) + + image[PdfName.Type] = PdfName.XObject + image[PdfName.Subtype] = PdfName.Image + image[PdfName.Filter] = ofilter + image[PdfName.Width] = imgwidthpx + image[PdfName.Height] = imgheightpx + image[PdfName.ColorSpace] = colorspace + # hardcoded as PIL doesnt provide bits for non-jpeg formats + image[PdfName.BitsPerComponent] = 8 + + if color == Colorspace['CMYK;I']: + # Inverts all four channels + image[PdfName.Decode] = [1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0] + + text = ("q\n%0.4f 0 0 %0.4f %0.4f %0.4f cm\n/Im0 Do\nQ" % + (imgwidthpdf, imgheightpdf, imgxpdf, imgypdf)).encode("ascii") + + content = PdfDict(stream=convert_load(text)) + resources = PdfDict(XObject=PdfDict(Im0=image)) + + page = PdfDict(indirect=True) + page[PdfName.Type] = PdfName.Page + page[PdfName.MediaBox] = [0, 0, pagewidth, pageheight] + page[PdfName.Resources] = resources + page[PdfName.Contents] = content + + self.writer.addpage(page) + + if not self.with_pdfrw: + self.writer.addobj(content) + self.writer.addobj(image) + + def tostring(self): + if self.with_pdfrw: + from pdfrw import PdfDict, PdfName, PdfArray, PdfObject + else: + PdfDict = MyPdfDict + PdfName = MyPdfName + PdfObject = MyPdfObject + PdfArray = MyPdfArray + NullObject = PdfObject('null') + TrueObject = PdfObject('true') + + # We fill the catalog with more information like /ViewerPreferences, + # /PageMode, /PageLayout or /OpenAction because the latter refers to a + # page object which has to be present so that we can get its id. + # + # Furthermore, if using pdfrw, the trailer is cleared every time a page + # is added, so we can only start using it after all pages have been + # written. + + if self.with_pdfrw: + catalog = self.writer.trailer.Root + else: + catalog = self.writer.catalog + + if self.fullscreen or self.fit_window or self.center_window or \ + self.panes is not None: + catalog[PdfName.ViewerPreferences] = PdfDict() + + if self.fullscreen: + # this setting might be overwritten later by the page mode + catalog[PdfName.ViewerPreferences][PdfName.NonFullScreenPageMode] \ + = PdfName.UseNone + + if self.panes == PageMode.thumbs: + catalog[PdfName.ViewerPreferences][PdfName.NonFullScreenPageMode] \ + = PdfName.UseThumbs + # this setting might be overwritten later if fullscreen + catalog[PdfName.PageMode] = PdfName.UseThumbs + elif self.panes == PageMode.outlines: + catalog[PdfName.ViewerPreferences][PdfName.NonFullScreenPageMode] \ + = PdfName.UseOutlines + # this setting might be overwritten later if fullscreen + catalog[PdfName.PageMode] = PdfName.UseOutlines + elif self.panes in [PageMode.none, None]: + pass + else: + raise Exception("unknown page mode: %s" % self.panes) + + if self.fit_window: + catalog[PdfName.ViewerPreferences][PdfName.FitWindow] = TrueObject + + if self.center_window: + catalog[PdfName.ViewerPreferences][PdfName.CenterWindow] = \ + TrueObject + + if self.fullscreen: + catalog[PdfName.PageMode] = PdfName.FullScreen + + # see table 8.2 in section 8.2.1 in + # http://partners.adobe.com/public/developer/en/pdf/PDFReference16.pdf + # Fit - Fits the page to the window. + # FitH - Fits the width of the page to the window. + # FitV - Fits the height of the page to the window. + # FitR - Fits the rectangle specified by the four coordinates to the + # window. + # FitB - Fits the page bounding box to the window. This basically + # reduces the amount of whitespace (margins) that is displayed + # and thus focussing more on the text content. + # FitBH - Fits the width of the page bounding box to the window. + # FitBV - Fits the height of the page bounding box to the window. + + # by default the initial page is the first one + initial_page = self.writer.pagearray[0] + # we set the open action here to make sure we open on the requested + # initial page but this value might be overwritten by a custom open + # action later while still taking the requested inital page into + # account + if self.initial_page is not None: + initial_page = self.writer.pagearray[self.initial_page - 1] + catalog[PdfName.OpenAction] = PdfArray([initial_page, PdfName.XYZ, + NullObject, NullObject, 0]) + + if self.magnification == Magnification.fit: + catalog[PdfName.OpenAction] = PdfArray([initial_page, PdfName.Fit]) + elif self.magnification == Magnification.fith: + pagewidth = initial_page[PdfName.MediaBox][2] + catalog[PdfName.OpenAction] = PdfArray( + [initial_page, PdfName.FitH, pagewidth]) + elif self.magnification == Magnification.fitbh: + # quick hack to determine the image width on the page + imgwidth = float(initial_page[PdfName.Contents].stream.split()[4]) + catalog[PdfName.OpenAction] = PdfArray( + [initial_page, PdfName.FitBH, imgwidth]) + elif isinstance(self.magnification, float): + catalog[PdfName.OpenAction] = PdfArray( + [initial_page, PdfName.XYZ, NullObject, NullObject, + self.magnification]) + elif self.magnification is None: + pass + else: + raise Exception("unknown magnification: %s" % self.magnification) + + if self.page_layout == PageLayout.single: + catalog[PdfName.PageLayout] = PdfName.SinglePage + elif self.page_layout == PageLayout.onecolumn: + catalog[PdfName.PageLayout] = PdfName.OneColumn + elif self.page_layout == PageLayout.twocolumnright: + catalog[PdfName.PageLayout] = PdfName.TwoColumnRight + elif self.page_layout == PageLayout.twocolumnleft: + catalog[PdfName.PageLayout] = PdfName.TwoColumnLeft + elif self.page_layout is None: + pass + else: + raise Exception("unknown page layout: %s" % self.page_layout) + + # now write out the PDF + if self.with_pdfrw: + outf = BytesIO() + self.writer.trailer.Info = self.info + self.writer.write(outf) + return outf.getvalue() + else: + return self.writer.getstring(self.info) + + +def read_image(rawdata, colorspace): + im = BytesIO(rawdata) + im.seek(0) + try: + imgdata = Image.open(im) + except IOError as e: + # test if it is a jpeg2000 image + if rawdata[:12] != "\x00\x00\x00\x0C\x6A\x50\x20\x20\x0D\x0A\x87\x0A": + logging.error("cannot read input image (not jpeg2000)") + logging.error("PIL: error reading image: %s", e) + exit(1) + # image is jpeg2000 + imgwidthpx, imgheightpx, ics = parsejp2(rawdata) + imgformat = ImageFormat.JPEG2000 + + # TODO: read real dpi from input jpeg2000 image + ndpi = (default_dpi, default_dpi) + logging.debug("input dpi = %d x %d", *ndpi) + + if colorspace: + color = colorspace + logging.debug("input colorspace (forced) = %s", ics) + else: + color = ics + logging.debug("input colorspace = %s", ics) + else: + imgwidthpx, imgheightpx = imgdata.size + imgformat = None + for f in ImageFormat: + if f.name == imgdata.format: + imgformat = f + if imgformat is None: + raise Exception("unknown PIL image format: %s" % imgdata.format) + + ndpi = imgdata.info.get("dpi", (default_dpi, default_dpi)) + # In python3, the returned dpi value for some tiff images will + # not be an integer but a float. To make the behaviour of + # img2pdf the same between python2 and python3, we convert that + # float into an integer by rounding. + # Search online for the 72.009 dpi problem for more info. + ndpi = (int(round(ndpi[0])), int(round(ndpi[1]))) + logging.debug("input dpi = %d x %d", *ndpi) + + if colorspace: + color = colorspace + logging.debug("input colorspace (forced) = %s", color) + else: + color = None + for c in Colorspace: + if c.name == imgdata.mode: + color = c + if color is None: + raise Exception("unknown PIL colorspace: %s" % imgdata.mode) + if color == Colorspace.CMYK and imgformat == ImageFormat.JPEG: + # Adobe inverts CMYK JPEGs for some reason, and others + # have followed suit as well. Some software assumes the + # JPEG is inverted if the Adobe tag (APP14), while other + # software assumes all CMYK JPEGs are inverted. I don't + # have enough experience with these to know which is + # better for images currently in the wild, so I'm going + # with the first approach for now. + if "adobe" in imgdata.info: + color = Colorspace['CMYK;I'] + logging.debug("input colorspace = %s", color.name) + + logging.debug("width x height = %dpx x %dpx", imgwidthpx, imgheightpx) + logging.debug("imgformat = %s", imgformat.name) + + # depending on the input format, determine whether to pass the raw + # image or the zlib compressed color information + if imgformat == ImageFormat.JPEG or imgformat == ImageFormat.JPEG2000: + if color == Colorspace['1']: + logging.error("jpeg can't be monochrome") + exit(1) + imgdata = rawdata + else: + # because we do not support /CCITTFaxDecode + if color == Colorspace['1']: + logging.debug("Converting colorspace 1 to L") + imgdata = imgdata.convert('L') + color = Colorspace.L + elif color in [Colorspace.RGB, Colorspace.L, Colorspace.CMYK, + Colorspace["CMYK;I"]]: + logging.debug("Colorspace is OK: %s", color) + elif color in [Colorspace.RGBA]: + logging.debug("Converting colorspace %s to RGB", color) + imgdata = imgdata.convert('RGB') + color = Colorspace.RGB + else: + raise Exception("unknown colorspace: %s" % color.name) + img = imgdata.tobytes() + # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the + # close() method + try: + imgdata.close() + except AttributeError: + pass + imgdata = zlib.compress(img) + im.close() + + return color, ndpi, imgformat, imgdata, imgwidthpx, imgheightpx + + +# converts a length in pixels to a length in PDF units (1/72 of an inch) +def px_to_pt(length, dpi): + return 72*length/dpi + + +def cm_to_pt(length): + return (72*length)/2.54 + + +def mm_to_pt(length): + return (72*length)/25.4 + + +def in_to_pt(length): + return 72*length + + +def get_layout_fun(pagesize, imgsize, border, fit, auto_orient): + def fitfun(fit, imgwidth, imgheight, fitwidth, fitheight): + if fitwidth is None and fitheight is None: + raise Exception("fitwidth and fitheight cannot both be None") + # if fit is fill or enlarge then it is okay if one of the dimensions + # are negative but one of them must still be positive + # if fit is not fill or enlarge then both dimensions must be positive + if fit in [FitMode.fill, FitMode.enlarge] and \ + (fitwidth < 0 and fitheight < 0): + raise Exception("cannot fit into a rectangle where both " + "dimensions are negative") + elif fit not in [FitMode.fill, FitMode.enlarge] and \ + (fitwidth < 0 or fitheight < 0): + raise Exception("cannot fit into a rectangle where either " + "dimensions are negative") + + def default(): + if fitwidth is not None and fitheight is not None: + newimgwidth = fitwidth + newimgheight = (newimgwidth * imgheight)/imgwidth + if newimgheight > fitheight: + newimgheight = fitheight + newimgwidth = (newimgheight * imgwidth)/imgheight + elif fitwidth is None and fitheight is not None: + newimgheight = fitheight + newimgwidth = (newimgheight * imgwidth)/imgheight + elif fitheight is None and fitwidth is not None: + newimgwidth = fitwidth + newimgheight = (newimgwidth * imgheight)/imgwidth + else: + raise Exception("fitwidth and fitheight cannot both be None") + return newimgwidth, newimgheight + if fit is None or fit == FitMode.into: + return default() + elif fit == FitMode.fill: + if fitwidth is not None and fitheight is not None: + newimgwidth = fitwidth + newimgheight = (newimgwidth * imgheight)/imgwidth + if newimgheight < fitheight: + newimgheight = fitheight + newimgwidth = (newimgheight * imgwidth)/imgheight + elif fitwidth is None and fitheight is not None: + newimgheight = fitheight + newimgwidth = (newimgheight * imgwidth)/imgheight + elif fitheight is None and fitwidth is not None: + newimgwidth = fitwidth + newimgheight = (newimgwidth * imgheight)/imgwidth + else: + raise Exception("fitwidth and fitheight cannot both be None") + return newimgwidth, newimgheight + elif fit == FitMode.exact: + if fitwidth is not None and fitheight is not None: + return fitwidth, fitheight + elif fitwidth is None and fitheight is not None: + newimgheight = fitheight + newimgwidth = (newimgheight * imgwidth)/imgheight + elif fitheight is None and fitwidth is not None: + newimgwidth = fitwidth + newimgheight = (newimgwidth * imgheight)/imgwidth + else: + raise Exception("fitwidth and fitheight cannot both be None") + elif fit == FitMode.shrink: + if fitwidth is not None and fitheight is not None: + if imgwidth <= fitwidth and imgheight <= fitheight: + return imgwidth, imgheight + elif fitwidth is None and fitheight is not None: + if imgheight <= fitheight: + return imgwidth, imgheight + elif fitheight is None and fitwidth is not None: + if imgwidth <= fitwidth: + return imgwidth, imgheight + else: + raise Exception("fitwidth and fitheight cannot both be None") + return default() + elif fit == FitMode.enlarge: + if fitwidth is not None and fitheight is not None: + if imgwidth > fitwidth or imgheight > fitheight: + return imgwidth, imgheight + elif fitwidth is None and fitheight is not None: + if imgheight > fitheight: + return imgwidth, imgheight + elif fitheight is None and fitwidth is not None: + if imgwidth > fitwidth: + return imgwidth, imgheight + else: + raise Exception("fitwidth and fitheight cannot both be None") + return default() + else: + raise NotImplementedError + # if no layout arguments are given, then the image size is equal to the + # page size and will be drawn with the default dpi + if pagesize is None and imgsize is None and border is None: + return default_layout_fun + if pagesize is None and imgsize is None and border is not None: + def layout_fun(imgwidthpx, imgheightpx, ndpi): + imgwidthpdf = px_to_pt(imgwidthpx, ndpi[0]) + imgheightpdf = px_to_pt(imgheightpx, ndpi[1]) + pagewidth = imgwidthpdf+2*border[1] + pageheight = imgheightpdf+2*border[0] + return pagewidth, pageheight, imgwidthpdf, imgheightpdf + return layout_fun + if border is None: + border = (0, 0) + # if the pagesize is given but the imagesize is not, then the imagesize + # will be calculated from the pagesize, taking into account the border + # and the fitting + if pagesize is not None and imgsize is None: + def layout_fun(imgwidthpx, imgheightpx, ndpi): + if pagesize[0] is not None and pagesize[1] is not None and \ + auto_orient and \ + ((imgwidthpx > imgheightpx and + pagesize[0] < pagesize[1]) or + (imgwidthpx < imgheightpx and pagesize[0] > pagesize[1])): + pagewidth, pageheight = pagesize[1], pagesize[0] + newborder = border[1], border[0] + else: + pagewidth, pageheight = pagesize[0], pagesize[1] + newborder = border + if pagewidth is not None: + fitwidth = pagewidth-2*newborder[0] + else: + fitwidth = None + if pageheight is not None: + fitheight = pageheight-2*newborder[1] + else: + fitheight = None + if fit in [FitMode.fill, FitMode.enlarge] and \ + fitwidth is not None and fitwidth < 0 and \ + fitheight is not None and fitheight < 0: + raise NegativeDimensionException( + "at least one border dimension musts be smaller than half " + "the respective page dimension") + elif fit not in [FitMode.fill, FitMode.enlarge] \ + and ((fitwidth is not None and fitwidth < 0) or + (fitheight is not None and fitheight < 0)): + raise NegativeDimensionException( + "one border dimension is larger than half of the " + "respective page dimension") + imgwidthpdf, imgheightpdf = \ + fitfun(fit, px_to_pt(imgwidthpx, ndpi[0]), + px_to_pt(imgheightpx, ndpi[1]), + fitwidth, fitheight) + if pagewidth is None: + pagewidth = imgwidthpdf+border[1]*2 + if pageheight is None: + pageheight = imgheightpdf+border[0]*2 + return pagewidth, pageheight, imgwidthpdf, imgheightpdf + return layout_fun + + def scale_imgsize(s, px, dpi): + if s is None: + return None + mode, value = s + if mode == ImgSize.abs: + return value + if mode == ImgSize.perc: + return (px_to_pt(px, dpi)*value)/100 + if mode == ImgSize.dpi: + return px_to_pt(px, value) + raise NotImplementedError + if pagesize is None and imgsize is not None: + def layout_fun(imgwidthpx, imgheightpx, ndpi): + imgwidthpdf, imgheightpdf = \ + fitfun(fit, px_to_pt(imgwidthpx, ndpi[0]), + px_to_pt(imgheightpx, ndpi[1]), + scale_imgsize(imgsize[0], imgwidthpx, ndpi[0]), + scale_imgsize(imgsize[1], imgheightpx, ndpi[1])) + pagewidth = imgwidthpdf+2*border[1] + pageheight = imgheightpdf+2*border[0] + return pagewidth, pageheight, imgwidthpdf, imgheightpdf + return layout_fun + if pagesize is not None and imgsize is not None: + def layout_fun(imgwidthpx, imgheightpx, ndpi): + if pagesize[0] is not None and pagesize[1] is not None and \ + auto_orient and \ + ((imgwidthpx > imgheightpx and + pagesize[0] < pagesize[1]) or + (imgwidthpx < imgheightpx and pagesize[0] > pagesize[1])): + pagewidth, pageheight = pagesize[1], pagesize[0] + else: + pagewidth, pageheight = pagesize[0], pagesize[1] + imgwidthpdf, imgheightpdf = \ + fitfun(fit, px_to_pt(imgwidthpx, ndpi[0]), + px_to_pt(imgheightpx, ndpi[1]), + scale_imgsize(imgsize[0], imgwidthpx, ndpi[0]), + scale_imgsize(imgsize[1], imgheightpx, ndpi[1])) + return pagewidth, pageheight, imgwidthpdf, imgheightpdf + return layout_fun + raise NotImplementedError + + +def default_layout_fun(imgwidthpx, imgheightpx, ndpi): + imgwidthpdf = pagewidth = px_to_pt(imgwidthpx, ndpi[0]) + imgheightpdf = pageheight = px_to_pt(imgheightpx, ndpi[1]) + return pagewidth, pageheight, imgwidthpdf, imgheightpdf + + +def convert(*images, title=None, author=None, creator=None, producer=None, creationdate=None, moddate=None, subject=None, keywords=None, colorspace=None, - nodate=False, verbose=False): + nodate=False, layout_fun=default_layout_fun, viewer_panes=None, + viewer_initial_page=None, viewer_magnification=None, + viewer_page_layout=None, viewer_fit_window=False, + viewer_center_window=False, viewer_fullscreen=False, + with_pdfrw=True): - pagesize_options = pagesize[2] + pdf = pdfdoc("1.3", title, author, creator, producer, creationdate, + moddate, subject, keywords, nodate, viewer_panes, + viewer_initial_page, viewer_magnification, viewer_page_layout, + viewer_fit_window, viewer_center_window, viewer_fullscreen, + with_pdfrw) - pdf = pdfdoc(3, title, author, creator, producer, creationdate, - moddate, subject, keywords, nodate) - - for imfilename in images: - debug_out("Reading %s"%imfilename, verbose) - try: - rawdata = imfilename.read() - except AttributeError: - with open(imfilename, "rb") as im: - rawdata = im.read() - im = cStringIO(rawdata) - try: - imgdata = Image.open(im) - except IOError as e: - # test if it is a jpeg2000 image - if rawdata[:12] != "\x00\x00\x00\x0C\x6A\x50\x20\x20\x0D\x0A\x87\x0A": - error_out("cannot read input image (not jpeg2000)") - error_out("PIL: %s"%e) - exit(1) - # image is jpeg2000 - width, height, ics = parsejp2(rawdata) - imgformat = "JPEG2000" - - # TODO: read real dpi from input jpeg2000 image - ndpi = (default_dpi, default_dpi) - debug_out("input dpi = %d x %d" % ndpi, verbose) - - if colorspace: - color = colorspace - debug_out("input colorspace (forced) = %s"%(ics)) - else: - color = ics - debug_out("input colorspace = %s"%(ics), verbose) - else: - width, height = imgdata.size - imgformat = imgdata.format - - ndpi = imgdata.info.get("dpi", (default_dpi, default_dpi)) - # in python3, the returned dpi value for some tiff images will - # not be an integer but a float. To make the behaviour of - # img2pdf the same between python2 and python3, we convert that - # float into an integer by rounding - # search online for the 72.009 dpi problem for more info - ndpi = (int(round(ndpi[0])),int(round(ndpi[1]))) - debug_out("input dpi = %d x %d" % ndpi, verbose) - - if colorspace: - color = colorspace - debug_out("input colorspace (forced) = %s"%(color), verbose) - else: - color = imgdata.mode - if color == "CMYK" and imgformat == "JPEG": - # Adobe inverts CMYK JPEGs for some reason, and others - # have followed suit as well. Some software assumes the - # JPEG is inverted if the Adobe tag (APP14), while other - # software assumes all CMYK JPEGs are inverted. I don't - # have enough experience with these to know which is - # better for images currently in the wild, so I'm going - # with the first approach for now. - if "adobe" in imgdata.info: - color = "CMYK;I" - debug_out("input colorspace = %s"%(color), verbose) - - debug_out("width x height = %d x %d"%(width,height), verbose) - debug_out("imgformat = %s"%imgformat, verbose) - - if dpi: - ndpi = dpi, dpi - debug_out("input dpi (forced) = %d x %d" % ndpi, verbose) - elif pagesize_options: - ndpi = get_ndpi(width, height, pagesize) - debug_out("calculated dpi (based on pagesize) = %d x %d" % ndpi, verbose) - - # depending on the input format, determine whether to pass the raw - # image or the zlib compressed color information - if imgformat is "JPEG" or imgformat is "JPEG2000": - if color == '1': - error_out("jpeg can't be monochrome") - exit(1) - imgdata = rawdata - else: - # because we do not support /CCITTFaxDecode - if color == '1': - debug_out("Converting colorspace 1 to L", verbose) - imgdata = imgdata.convert('L') - color = 'L' - elif color in ("RGB", "L", "CMYK", "CMYK;I"): - debug_out("Colorspace is OK: %s"%color, verbose) - else: - debug_out("Converting colorspace %s to RGB"%color, verbose) - imgdata = imgdata.convert('RGB') - color = imgdata.mode - img = imgdata.tobytes() - # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the close() method - try: - imgdata.close() - except AttributeError: - pass - imgdata = zlib.compress(img) - im.close() - - if pagesize_options and pagesize_options['exact'][1]: - # output size exactly to specified dimensions - # pagesize[0], pagesize[1] already checked in valid_size() - pdf_x, pdf_y = pagesize[0], pagesize[1] - else: - # output size based on dpi; point = 1/72 inch - pdf_x, pdf_y = 72.0*width/float(ndpi[0]), 72.0*height/float(ndpi[1]) - - pdf.addimage(color, width, height, imgformat, imgdata, pdf_x, pdf_y) + for rawdata in images: + color, ndpi, imgformat, imgdata, imgwidthpx, imgheightpx = \ + read_image(rawdata, colorspace) + pagewidth, pageheight, imgwidthpdf, imgheightpdf = \ + layout_fun(imgwidthpx, imgheightpx, ndpi) + if pagewidth < 3.00 or pageheight < 3.00: + logging.warning("pdf width or height is below 3.00 - too small " + "for some viewers!") + # the image is always centered on the page + imgxpdf = (pagewidth - imgwidthpdf)/2.0 + imgypdf = (pageheight - imgheightpdf)/2.0 + pdf.add_imagepage(color, imgwidthpx, imgheightpx, imgformat, imgdata, + imgwidthpdf, imgheightpdf, imgxpdf, imgypdf, + pagewidth, pageheight) return pdf.tostring() -def get_ndpi(width, height, pagesize): - pagesize_options = pagesize[2] - if pagesize_options and pagesize_options['fill'][1]: - if width/height < pagesize[0]/pagesize[1]: - tmp_dpi = 72.0*width/pagesize[0] - else: - tmp_dpi = 72.0*height/pagesize[1] - elif pagesize[0] and pagesize[1]: - # if both height and width given with no specific pagesize_option, - # resize to fit "into" page - if width/height < pagesize[0]/pagesize[1]: - tmp_dpi = 72.0*height/pagesize[1] - else: - tmp_dpi = 72.0*width/pagesize[0] - elif pagesize[0]: - # if width given, calculate dpi based on width - tmp_dpi = 72.0*width/pagesize[0] - elif pagesize[1]: - # if height given, calculate dpi based on height - tmp_dpi = 72.0*height/pagesize[1] +def parse_num(num, name): + if num == '': + return None + unit = None + if num.endswith("pt"): + unit = Unit.pt + elif num.endswith("cm"): + unit = Unit.cm + elif num.endswith("mm"): + unit = Unit.mm + elif num.endswith("in"): + unit = Unit.inch else: - tmp_dpi = default_dpi + try: + num = float(num) + except ValueError: + msg = "%s is not a floating point number and doesn't have a " \ + "valid unit: %s" % (name, num) + raise argparse.ArgumentTypeError(msg) + if unit is None: + unit = Unit.pt + else: + num = num[:-2] + try: + num = float(num) + except ValueError: + msg = "%s is not a floating point number: %s" % (name, num) + raise argparse.ArgumentTypeError(msg) + if unit == Unit.cm: + num = cm_to_pt(num) + elif unit == Unit.mm: + num = mm_to_pt(num) + elif unit == Unit.inch: + num = in_to_pt(num) + return num - return tmp_dpi, tmp_dpi -def positive_float(string): - value = float(string) - if value <= 0: - msg = "%r is not positive"%string - raise argparse.ArgumentTypeError(msg) - return value +def parse_imgsize_num(num, name): + if num == '': + return None + unit = None + if num.endswith("pt"): + unit = ImgUnit.pt + elif num.endswith("cm"): + unit = ImgUnit.cm + elif num.endswith("mm"): + unit = ImgUnit.mm + elif num.endswith("in"): + unit = ImgUnit.inch + elif num.endswith("dpi"): + unit = ImgUnit.dpi + elif num.endswith("%"): + unit = ImgUnit.perc + else: + try: + num = float(num) + except ValueError: + msg = "%s is not a floating point number and doesn't have a " \ + "valid unit: %s" % (name, num) + raise argparse.ArgumentTypeError(msg) + if unit is None: + unit = ImgUnit.pt + else: + num = num[:-2] + try: + num = float(num) + except ValueError: + msg = "%s is not a floating point number: %s" % (name, num) + raise argparse.ArgumentTypeError(msg) + if unit == ImgUnit.cm: + num = (ImgSize.abs, cm_to_pt(num)) + elif unit == ImgUnit.mm: + num = (ImgSize.abs, mm_to_pt(num)) + elif unit == ImgUnit.inch: + num = (ImgSize.abs, in_to_pt(num)) + elif unit == ImgUnit.pt: + num = (ImgSize.abs, num) + elif unit == ImgUnit.dpi: + num = (ImgSize.dpi, num) + elif unit == ImgUnit.perc: + num = (ImgSize.perc, num) + return num + + +def parse_pagesize_rectarg(string): + transposed = string.endswith("^T") + if transposed: + string = string[:-2] + if papersizes.get(string.lower()): + string = papersizes[string.lower()] + if 'x' not in string: + raise argparse.ArgumentTypeError("size must contain 'x' character") + w, h = string.split('x', 1) + if transposed: + w, h = h, w + w, h = parse_num(w, "width"), parse_num(h, "height") + if w is None and h is None: + raise argparse.ArgumentTypeError("at least one dimension must be " + "specified") + return w, h + + +def parse_imgsize_rectarg(string): + transposed = string.endswith("^T") + if transposed: + string = string[:-2] + if papersizes.get(string.lower()): + string = papersizes[string.lower()] + if 'x' not in string: + raise argparse.ArgumentTypeError("size must contain 'x' character") + w, h = string.split('x', 1) + if transposed: + w, h = h, w + w, h = parse_imgsize_num(w, "width"), parse_imgsize_num(h, "height") + if w is None and h is None: + raise argparse.ArgumentTypeError("at least one dimension must be " + "specified") + return w, h + + +def parse_colorspacearg(string): + for c in Colorspace: + if c.name == string: + return c + allowed = ", ".join([c.name for c in Colorspace]) + raise argparse.ArgumentTypeError("Unsupported colorspace: %s. Must be one " + "of: %s." % (string, allowed)) + + +def parse_borderarg(string): + if ':' in string: + h, v = string.split(':', 1) + if h == '': + raise argparse.ArgumentTypeError("missing value before colon") + if v == '': + raise argparse.ArgumentTypeError("missing value after colon") + else: + if string == '': + raise argparse.ArgumentTypeError("border option cannot be empty") + h, v = string, string + h, v = parse_num(h, "left/right border"), parse_num(v, "top/bottom border") + if h is None and v is None: + raise argparse.ArgumentTypeError("missing value") + return h, v + + +def input_images(path): + if path == '-': + rawdata = sys.stdin.buffer.read() + else: + try: + with open(path, "rb") as im: + rawdata = im.read() + except IsADirectoryError: + raise argparse.ArgumentTypeError( + "\"%s\" is a directory" % path) + except PermissionError: + raise argparse.ArgumentTypeError( + "\"%s\" permission denied" % path) + except FileNotFoundError: + raise argparse.ArgumentTypeError( + "\"%s\" does not exist" % path) + if len(rawdata) == 0: + raise argparse.ArgumentTypeError("\"%s\" is empty" % path) + return rawdata + + +def parse_fitarg(string): + for m in FitMode: + if m.name == string.lower(): + return m + raise argparse.ArgumentTypeError("unknown fit mode: %s" % string) + + +def parse_panes(string): + for m in PageMode: + if m.name == string.lower(): + return m + allowed = ", ".join([m.name for m in PageMode]) + raise argparse.ArgumentTypeError("Unsupported page mode: %s. Must be one " + "of: %s." % (string, allowed)) + + +def parse_magnification(string): + for m in Magnification: + if m.name == string.lower(): + return m + try: + return float(string) + except ValueError: + pass + allowed = ", ".join([m.name for m in Magnification]) + raise argparse.ArgumentTypeError("Unsupported magnification: %s. Must be " + "a floating point number or one of: %s." % + (string, allowed)) + + +def parse_layout(string): + for l in PageLayout: + if l.name == string.lower(): + return l + allowed = ", ".join([l.name for l in PageLayout]) + raise argparse.ArgumentTypeError("Unsupported page layout: %s. Must be " + "one of: %s." % (string, allowed)) + def valid_date(string): # first try parsing in ISO8601 format @@ -408,241 +1109,400 @@ def valid_date(string): pass else: return datetime.utcfromtimestamp(int(utime)) - raise argparse.ArgumentTypeError("cannot parse date: %s"%string) + raise argparse.ArgumentTypeError("cannot parse date: %s" % string) -def get_standard_papersize(string): - papersizes = { - "11x17" : "792x792^", # "792x1224", - "ledger" : "792x792^", # "1224x792", - "legal" : "612x612^", # "612x1008", - "letter" : "612x612^", # "612x792", - "arche" : "2592x2592^", # "2592x3456", - "archd" : "1728x1728^", # "1728x2592", - "archc" : "1296x1296^", # "1296x1728", - "archb" : "864x864^", # "864x1296", - "archa" : "648x648^", # "648x864", - "a0" : "2380x2380^", # "2380x3368", - "a1" : "1684x1684^", # "1684x2380", - "a2" : "1190x1190^", # "1190x1684", - "a3" : "842x842^", # "842x1190", - "a4" : "595x595^", # "595x842", - "a5" : "421x421^", # "421x595", - "a6" : "297x297^", # "297x421", - "a7" : "210x210^", # "210x297", - "a8" : "148x148^", # "148x210", - "a9" : "105x105^", # "105x148", - "a10" : "74x74^", # "74x105", - "b0" : "2836x2836^", # "2836x4008", - "b1" : "2004x2004^", # "2004x2836", - "b2" : "1418x1418^", # "1418x2004", - "b3" : "1002x1002^", # "1002x1418", - "b4" : "709x709^", # "709x1002", - "b5" : "501x501^", # "501x709", - "c0" : "2600x2600^", # "2600x3677", - "c1" : "1837x1837^", # "1837x2600", - "c2" : "1298x1298^", # "1298x1837", - "c3" : "918x918^", # "918x1298", - "c4" : "649x649^", # "649x918", - "c5" : "459x459^", # "459x649", - "c6" : "323x323^", # "323x459", - "flsa" : "612x612^", # "612x936", - "flse" : "612x612^", # "612x936", - "halfletter" : "396x396^", # "396x612", - "tabloid" : "792x792^", # "792x1224", - "statement" : "396x396^", # "396x612", - "executive" : "540x540^", # "540x720", - "folio" : "612x612^", # "612x936", - "quarto" : "610x610^", # "610x780" - } - string = string.lower() - return papersizes.get(string, string) +def main(): + rendered_papersizes = "" + for k, v in sorted(papersizes.items()): + rendered_papersizes += " %-8s %s\n" % (papernames[k], v) -def valid_size(string): - # conversion factors from units to points - units = { - 'in' : 72.0, - 'cm' : 72.0/2.54, - 'mm' : 72.0/25.4, - 'pt' : 1.0 - } + parser = argparse.ArgumentParser( + formatter_class=argparse.RawDescriptionHelpFormatter, + description='''\ +Losslessly convert raster images to PDF without re-encoding JPEG and JPEG2000 +images. This leads to a lossless conversion of JPEG and JPEG2000 images with +the only added file size coming from the PDF container itself. - pagesize_options = { - 'exact' : ['\!', False], - 'shrink' : ['\>', False], - 'enlarge' : ['\<', False], - 'fill' : ['\^', False], - 'percent' : ['\%', False], - 'count' : ['\@', False], - } +Other raster graphics formats are losslessly stored in a zip/flate encoding of +their RGB representation. This might increase file size and does not store +transparency. There is nothing that can be done about that until the PDF format +allows embedding other image formats like PNG. Thus, img2pdf is primarily +useful to convert JPEG and JPEG2000 images to PDF. - string = get_standard_papersize(string) +The output is sent to standard output so that it can be redirected into a file +or to another program as part of a shell pipe. To directly write the output +into a file, use the -o or --output option. +''', + epilog='''\ +Colorspace - pattern = re.compile(r""" - ([0-9]*\.?[0-9]*) # tokens.group(1) == width; may be empty - ([a-z]*) # tokens.group(2) == units; may be empty - x - ([0-9]*\.?[0-9]*) # tokens.group(3) == height; may be empty - ([a-zA-Z]*) # tokens.group(4) == units; may be empty - ([^0-9a-zA-Z]*) # tokens.group(5) == extra options - """, re.VERBOSE) + Currently, the colorspace must be forced for JPEG 2000 images that are not in + the RGB colorspace. Available colorspace options are based on Python Imaging + Library (PIL) short handles. - tokens = pattern.match(string) + RGB RGB color + L Grayscale + 1 Black and white (internally converted to grayscale) + CMYK CMYK color + CMYK;I CMYK color with inversion (for CMYK JPEG files from Adobe) - # tokens.group(0) should match entire input string - if tokens.group(0) != string: - msg = ('Input size needs to be of the format AuxBv#, ' - 'where A is width, B is height, u and v are units, ' - '# are options. ' - 'You may omit either width or height, but not both. ' - 'Units may be specified as (in, cm, mm, pt). ' - 'You may omit units, which will default to pt. ' - 'Available options include (! = exact ; ^ = fill ; default = into).') - raise argparse.ArgumentTypeError(msg) +Paper sizes - # temporary list to loop through to process width and height - pagesize_size = { - 'x' : [0, tokens.group(1), tokens.group(2)], - 'y' : [0, tokens.group(3), tokens.group(4)] - } + You can specify the short hand paper size names shown in the first column in + the table below as arguments to the --pagesize and --imgsize options. The + width and height they are mapping to is shown in the second column. Giving + the value in the second column has the same effect as giving the short hand + in the first column. Appending ^T (a caret/circumflex followed by the letter + T) turns the paper size from portrait into landscape. The postfix thus + symbolizes the transpose. The values are case insensitive. - for key, value in pagesize_size.items(): +%s + +Fit options + + The img2pdf options for the --fit argument are shown in the first column in + the table below. The function of these options can be mapped to the geometry + operators of imagemagick. For users who are familiar with imagemagick, the + corresponding operator is shown in the second column. The third column shows + whether or not the aspect ratio is preserved for that option (same as in + imagemagick). Just like imagemagick, img2pdf tries hard to preserve the + aspect ratio, so if the --fit argument is not given, then the default is + "into" which corresponds to the absence of any operator in imagemagick. + The value of the --fit option is case insensitive. + + into | | Y | The default. Width and height values specify maximum + | | | values. + ---------+---+---+---------------------------------------------------------- + fill | ^ | Y | Width and height values specify the minimum values. + ---------+---+---+---------------------------------------------------------- + exact | ! | N | Width and height emphatically given. + ---------+---+---+---------------------------------------------------------- + shrink | > | Y | Shrinks an image with dimensions larger than the given + | | | ones (and otherwise behaves like "into"). + ---------+---+---+---------------------------------------------------------- + enlarge | < | Y | Enlarges an image with dimensions smaller than the given + | | | ones (and otherwise behaves like "into"). + +Examples + + Lines starting with a dollar sign denote commands you can enter into your + terminal. The dollar sign signifies your command prompt. It is not part of + the command you type. + + Convert two scans in JPEG format to a PDF document. + + $ img2pdf --output out.pdf page1.jpg page2.jpg + + Convert a directory of JPEG images into a PDF with printable A4 pages in + landscape mode. On each page, the photo takes the maximum amount of space + while preserving its aspect ratio and a print border of 2 cm on the top and + bottom and 2.5 cm on the left and right hand side. + + $ img2pdf --output out.pdf --pagesize A4^T --border 2cm:2.5cm *.jpg + + On each A4 page, fit images into a 10 cm times 15 cm rectangle but keep the + original image size if the image is smaller than that. + + $ img2pdf --output out.pdf -S A4 --imgsize 10cmx15cm --fit shrink *.jpg + + Prepare a directory of photos to be printed borderless on photo paper with a + 3:2 aspect ratio and rotate each page so that its orientation is the same as + the input image. + + $ img2pdf --output out.pdf --pagesize 15cmx10cm --auto-orient *.jpg + + Encode a grayscale JPEG2000 image. The colorspace has to be forced as img2pdf + cannot read it from the JPEG2000 file automatically. + + $ img2pdf --output out.pdf --colorspace L input.jp2 + +Argument parsing + + Argument long options can be abbreviated to a prefix if the abbreviation is + anambiguous. That is, the prefix must match a unique option. + + Beware of your shell interpreting argument values as special characters (like + the semicolon in the CMYK;I colorspace option). If in doubt, put the argument + values in single quotes. + + If you want an argument value to start with one or more minus characters, you + must use the long option name and join them with an equal sign like so: + + $ img2pdf --author=--test-- + + If your input file name starts with one or more minus characters, either + separate the input files from the other arguments by two minus signs: + + $ img2pdf -- --my-file-starts-with-two-minuses.jpg + + Or be more explicit about its relative path by prepending a ./: + + $ img2pdf ./--my-file-starts-with-two-minuses.jpg + + The order of non-positional arguments (all arguments other than the input + images) does not matter. +''' % rendered_papersizes) + + parser.add_argument( + 'images', metavar='infile', type=input_images, nargs='*', + help='Specifies the input file(s) in any format that can be read by ' + 'the Python Imaging Library (PIL). If no input images are given, then ' + 'a single image is read from standard input. The special filename "-" ' + 'can be used once to read an image from standard input. To read a ' + 'file in the current directory with the filename "-", pass it to ' + 'img2pdf by explicitly stating its relative path like "./-".') + parser.add_argument( + '-v', '--verbose', action="store_true", + help='Makes the program operate in verbose mode, printing messages on ' + 'standard error.') + parser.add_argument( + '-V', '--version', action='version', version='%(prog)s '+__version__, + help="Prints version information and exits.") + + outargs = parser.add_argument_group( + title='General output arguments', + description='') + + outargs.add_argument( + '-o', '--output', metavar='out', type=argparse.FileType('wb'), + default=sys.stdout.buffer, + help='Makes the program output to a file instead of standard output.') + outargs.add_argument( + '-C', '--colorspace', metavar='colorspace', type=parse_colorspacearg, + help=''' +Forces the PIL colorspace. See the epilogue for a list of possible values. +Usually the PDF colorspace would be derived from the color space of the input +image. This option overwrites the automatically detected colorspace from the +input image and thus forces a certain colorspace in the output PDF /ColorSpace +property. This is useful for JPEG 2000 images with a different colorspace than +RGB.''') + + outargs.add_argument( + '-D', '--nodate', action="store_true", + help='Suppresses timestamps in the output and thus makes the output ' + 'deterministic between individual runs. You can also manually ' + 'set a date using the --moddate and --creationdate options.') + + outargs.add_argument( + "--without-pdfrw", action="store_true", + help="By default, img2pdf uses the pdfrw library to create the output " + "PDF if pdfrw is available. If you want to use the internal PDF " + "generator of img2pdf even if pdfrw is present, then pass this " + "option. This can be useful if you want to have unicode metadata " + "values which pdfrw does not yet support (See " + "https://github.com/pmaupin/pdfrw/issues/39) or if you want the " + "PDF code to be more human readable.") + + sizeargs = parser.add_argument_group( + title='Image and page size and layout arguments', + description='''\ + +Every input image will be placed on its own page. The image size is controlled +by the dpi value of the input image or, if unset or missing, the default dpi of +%.2f. By default, each page will have the same size as the image it shows. +Thus, there will be no visible border between the image and the page border by +default. If image size and page size are made different from each other by the +options in this section, the image will always be centered in both dimensions. + +The image size and page size can be explicitly set using the --imgsize and +--pagesize options, respectively. If either dimension of the image size is +specified but the same dimension of the page size is not, then the latter will +be derived from the former using an optional minimal distance between the image +and the page border (given by the --border option) and/or a certain fitting +strategy (given by the --fit option). The converse happens if a dimension of +the page size is set but the same dimension of the image size is not. + +Any length value in below options is represented by the meta variable L which +is a floating point value with an optional unit appended (without a space +between them). The default unit is pt (1/72 inch, the PDF unit) and other +allowed units are cm (centimeter), mm (millimeter), and in (inch). + +Any size argument of the format LxL in the options below specifies the width +and height of a rectangle where the first L represents the width and the second +L represents the height with an optional unit following each value as described +above. Either width or height may be omitted but in that case the separating x +must still be present. Instead of giving the width and height explicitly, you +may also specify some (case-insensitive) common page sizes such as letter and +A4. See the epilogue at the bottom for a complete list of the valid sizes. + +The --fit option scales to fit the image into a rectangle that is either +derived from the the --imgsize option or otherwise from the --pagesize option. +If the --border option is given in addition to the --imgsize option while the +--pagesize option is not given, then the page size will be calculated from the +image size, respecting the border setting. If the --border option is given in +addition to the --pagesize option while the --imgsize option is not given, then +the image size will be calculated from the page size, respecting the border +setting. If the --border option is given while both the --pagesize and +--imgsize options are passed, then the --border option will be ignored. + +''' % default_dpi) + + sizeargs.add_argument( + '-S', '--pagesize', metavar='LxL', type=parse_pagesize_rectarg, + help=''' +Sets the size of the PDF pages. The short-option is the upper case S because +it is an mnemonic for being bigger than the image size.''') + + sizeargs.add_argument( + '-s', '--imgsize', metavar='LxL', type=parse_imgsize_rectarg, + help=''' +Sets the size of the images on the PDF pages. In addition, the unit dpi is +allowed which will set the image size as a value of dots per inch. Instead of +a unit, width and height values may also have a percentage sign appended, +indicating a resize of the image by that percentage. The short-option is the +lower case s because it is an mnemonic for being smaller than the page size. +''') + sizeargs.add_argument( + '-b', '--border', metavar='L[:L]', type=parse_borderarg, + help=''' +Specifies the minimal distance between the image border and the PDF page +border. This value Is overwritten by explicit values set by --pagesize or +--imgsize. The value will be used when calculating page dimensions from the +image dimensions or the other way round. One, or two length values can be given +as an argument, separated by a colon. One value specifies the minimal border on +all four sides. Two values specify the minimal border on the top/bottom and +left/right, respectively. It is not possible to specify asymmetric borders +because images will always be centered on the page. +''') + sizeargs.add_argument( + '-f', '--fit', metavar='FIT', type=parse_fitarg, + default=FitMode.into, help=''' + +If --imgsize is given, fits the image using these dimensions. Otherwise, fit +the image into the dimensions given by --pagesize. FIT is one of into, fill, +exact, shrink and enlarge. The default value is "into". See the epilogue at the +bottom for a description of the FIT options. + +''') + sizeargs.add_argument( + '-a', '--auto-orient', action="store_true", + help=''' +If both dimensions of the page are given via --pagesize, conditionally swaps +these dimensions such that the page orientation is the same as the orientation +of the input image. If the orientation of a page gets flipped, then so do the +values set via the --border option. +''') + + metaargs = parser.add_argument_group(title='Arguments setting metadata', + description='') + metaargs.add_argument( + '--title', metavar='title', type=str, + help='Sets the title metadata value') + metaargs.add_argument( + '--author', metavar='author', type=str, + help='Sets the author metadata value') + metaargs.add_argument( + '--creator', metavar='creator', type=str, + help='Sets the creator metadata value') + metaargs.add_argument( + '--producer', metavar='producer', type=str, + help='Sets the producer metadata value') + metaargs.add_argument( + '--creationdate', metavar='creationdate', type=valid_date, + help='Sets the UTC creation date metadata value in YYYY-MM-DD or ' + 'YYYY-MM-DDTHH:MM or YYYY-MM-DDTHH:MM:SS format or any format ' + 'understood by python dateutil module or any format understood ' + 'by `date --date`') + metaargs.add_argument( + '--moddate', metavar='moddate', type=valid_date, + help='Sets the UTC modification date metadata value in YYYY-MM-DD ' + 'or YYYY-MM-DDTHH:MM or YYYY-MM-DDTHH:MM:SS format or any format ' + 'understood by python dateutil module or any format understood ' + 'by `date --date`') + metaargs.add_argument( + '--subject', metavar='subject', type=str, + help='Sets the subject metadata value') + metaargs.add_argument( + '--keywords', metavar='kw', type=str, nargs='+', + help='Sets the keywords metadata value (can be given multiple times)') + + viewerargs = parser.add_argument_group( + title='PDF viewer arguments', + description='PDF files can specify how they are meant to be ' + 'presented to the user by a PDF viewer') + + viewerargs.add_argument( + '--viewer-panes', metavar="PANES", type=parse_panes, + help='Instruct the PDF viewer which side panes to show. Valid values ' + 'are "outlines" and "thumbs". It is not possible to specify both ' + 'at the same time.') + viewerargs.add_argument( + '--viewer-initial-page', metavar="NUM", type=int, + help='Instead of showing the first page, instruct the PDF viewer to ' + 'show the given page instead. Page numbers start with 1.') + viewerargs.add_argument( + '--viewer-magnification', metavar="MAG", type=parse_magnification, + help='Instruct the PDF viewer to open the PDF with a certain zoom ' + 'level. Valid values are either a floating point number giving ' + 'the exact zoom level, "fit" (zoom to fit whole page), "fith" ' + '(zoom to fit page width) and "fitbh" (zoom to fit visible page ' + 'width).') + viewerargs.add_argument( + '--viewer-page-layout', metavar="LAYOUT", type=parse_layout, + help='Instruct the PDF viewer how to arrange the pages on the screen. ' + 'Valid values are "single" (display single pages), "onecolumn" ' + '(one continuous column), "twocolumnright" (two continuous ' + 'columns with odd number pages on the right) and "twocolumnleft" ' + '(two continuous columns with odd numbered pages on the left)') + viewerargs.add_argument( + '--viewer-fit-window', action="store_true", + help='Instruct the PDF viewer to resize the window to fit the page ' + 'size') + viewerargs.add_argument( + '--viewer-center-window', action="store_true", + help='Instruct the PDF viewer to center the PDF viewer window') + viewerargs.add_argument( + '--viewer-fullscreen', action="store_true", + help='Instruct the PDF viewer to open the PDF in fullscreen mode') + + args = parser.parse_args() + + if args.verbose: + logging.basicConfig(level=logging.DEBUG) + + layout_fun = get_layout_fun(args.pagesize, args.imgsize, args.border, + args.fit, args.auto_orient) + + # if no positional arguments were supplied, read a single image from + # standard input + if len(args.images) == 0: + logging.info("reading image from standard input") try: - value[0] = float(value[1]) - value[0] *= units[value[2]] # convert to points - except ValueError: - # assign None if width or height not provided - value[0] = None - except KeyError: - # if units unrecognized, raise error - # otherwise default to pt because units not provided - if value[2]: - msg = "unrecognized unit '%s'." % value[2] - raise argparse.ArgumentTypeError(msg) + args.images = [sys.stdin.buffer.read()] + except KeyboardInterrupt: + exit(0) - x = pagesize_size['x'][0] - y = pagesize_size['y'][0] - - # parse options for resize methods - if tokens.group(5): - for key, value in pagesize_options.items(): - if re.search(value[0], tokens.group(5)): - value[1] = True - - if pagesize_options['fill'][1]: - # if either width or height is not given, try to fill in missing value - if not x: - x = y - elif not y: - y = x - - if pagesize_options['exact'][1]: - if not x or not y: - msg = ('exact size requires both width and height.') - raise argparse.ArgumentTypeError(msg) - - if not x and not y: - msg = ('width and height cannot both be omitted.') - raise argparse.ArgumentTypeError(msg) - - return (x, y, pagesize_options) - -# in python3, the received argument will be a unicode str() object which needs -# to be encoded into a bytes() object -# in python2, the received argument will be a binary str() object which needs -# no encoding -# we check whether we use python2 or python3 by checking whether the argument -# is both, type str and type bytes (only the case in python2) -def pdf_embedded_string(string): - if type(string) is str and type(string) is not bytes: - # py3 - pass - else: - # py2 - string = string.decode("utf8") - string = b"\xfe\xff"+string.encode("utf-16-be") - string = string.replace(b'\\', b'\\\\') - string = string.replace(b'(', b'\\(') - string = string.replace(b')', b'\\)') - return string - -parser = argparse.ArgumentParser( - description='Lossless conversion/embedding of images (in)to pdf') -parser.add_argument( - 'images', metavar='infile', type=str, - nargs='+', help='input file(s)') -parser.add_argument( - '-o', '--output', metavar='out', type=argparse.FileType('wb'), - default=getattr(sys.stdout, "buffer", sys.stdout), - help='output file (default: stdout)') - -sizeopts = parser.add_mutually_exclusive_group() -sizeopts.add_argument( - '-d', '--dpi', metavar='dpi', type=positive_float, - help=('dpi for pdf output. ' - 'If input image does not specify dpi the default is %.2f. ' - 'Must not be used with -s/--pagesize.') % default_dpi -) - -sizeopts.add_argument( - '-s', '--pagesize', metavar='size', type=valid_size, - default=(None, None, None), - help=('size of the pdf pages in format AuxBv#, ' - 'where A is width, B is height, u and v are units, # are options. ' - 'You may omit either width or height, but not both. ' - 'Some common page sizes, such as letter and a4, are also recognized. ' - 'Units may be specified as (in, cm, mm, pt). ' - 'Units default to pt when absent. ' - 'Available options include (! = exact ; ^ = fill ; default = into). ' - 'Must not be used with -d/--dpi.') -) - -parser.add_argument( - '-t', '--title', metavar='title', type=pdf_embedded_string, - help='title for metadata') -parser.add_argument( - '-a', '--author', metavar='author', type=pdf_embedded_string, - help='author for metadata') -parser.add_argument( - '-c', '--creator', metavar='creator', type=pdf_embedded_string, - help='creator for metadata') -parser.add_argument( - '-p', '--producer', metavar='producer', type=pdf_embedded_string, - help='producer for metadata') -parser.add_argument( - '-r', '--creationdate', metavar='creationdate', type=valid_date, - help='UTC creation date for metadata in YYYY-MM-DD or YYYY-MM-DDTHH:MM or YYYY-MM-DDTHH:MM:SS format or any format understood by python dateutil module or any format understood by `date --date`') -parser.add_argument( - '-m', '--moddate', metavar='moddate', type=valid_date, - help='UTC modification date for metadata in YYYY-MM-DD or YYYY-MM-DDTHH:MM or YYYY-MM-DDTHH:MM:SS format or any format understood by python dateutil module or any format understood by `date --date`') -parser.add_argument( - '-S', '--subject', metavar='subject', type=pdf_embedded_string, - help='subject for metadata') -parser.add_argument( - '-k', '--keywords', metavar='kw', type=pdf_embedded_string, nargs='+', - help='keywords for metadata') -parser.add_argument( - '-C', '--colorspace', metavar='colorspace', type=pdf_embedded_string, - help='force PIL colorspace (one of: RGB, L, 1, CMYK, CMYK;I)') -parser.add_argument( - '-D', '--nodate', help='do not add timestamps', action="store_true") -parser.add_argument( - '-v', '--verbose', help='verbose mode', action="store_true") -parser.add_argument( - '-V', '--version', action='version', version='%(prog)s '+__version__, - help="Print version information and exit") - -def main(args=None): - if args is None: - args = sys.argv[1:] - args = parser.parse_args(args) + # with the number of pages being equal to the number of images, the + # value passed to --viewer-initial-page must be between 1 and that number + if args.viewer_initial_page is not None: + if args.viewer_initial_page < 1: + parser.print_usage(file=sys.stderr) + logging.error("%s: error: argument --viewer-initial-page: must be " + "greater than zero" % parser.prog) + exit(2) + if args.viewer_initial_page > len(args.images): + parser.print_usage(file=sys.stderr) + logging.error("%s: error: argument --viewer-initial-page: must be " + "less than or equal to the total number of pages" % + parser.prog) + exit(2) args.output.write( convert( - args.images, args.dpi, args.pagesize, args.title, args.author, - args.creator, args.producer, args.creationdate, args.moddate, - args.subject, args.keywords, args.colorspace, args.nodate, - args.verbose)) + *args.images, title=args.title, author=args.author, + creator=args.creator, producer=args.producer, + creationdate=args.creationdate, moddate=args.moddate, + subject=args.subject, keywords=args.keywords, + colorspace=args.colorspace, nodate=args.nodate, + layout_fun=layout_fun, viewer_panes=args.viewer_panes, + viewer_initial_page=args.viewer_initial_page, + viewer_magnification=args.viewer_magnification, + viewer_page_layout=args.viewer_page_layout, + viewer_fit_window=args.viewer_fit_window, + viewer_center_window=args.viewer_center_window, + viewer_fullscreen=args.viewer_fullscreen, + with_pdfrw=not args.without_pdfrw)) if __name__ == '__main__': main() diff --git a/src/jp2.py b/src/jp2.py index c897e5f..644cb12 100644 --- a/src/jp2.py +++ b/src/jp2.py @@ -21,6 +21,7 @@ import struct + def getBox(data, byteStart, noBytes): boxLengthValue = struct.unpack(">I", data[byteStart:byteStart+4])[0] boxType = data[byteStart+4:byteStart+8] @@ -34,11 +35,13 @@ def getBox(data, byteStart, noBytes): boxContents = data[byteStart+contentsStartOffset:byteEnd] return (boxLengthValue, boxType, byteEnd, boxContents) + def parse_ihdr(data): height = struct.unpack(">I", data[0:4])[0] width = struct.unpack(">I", data[4:8])[0] return width, height + def parse_colr(data): meth = struct.unpack(">B", data[0:1])[0] if meth != 1: @@ -49,15 +52,18 @@ def parse_colr(data): elif enumCS == 17: return "L" else: - raise Exception("only sRGB and greyscale color space is supported, got %d"%enumCS) + raise Exception("only sRGB and greyscale color space is supported, " + "got %d" % enumCS) + def parse_jp2h(data): width, height, colorspace = None, None, None - noBytes=len(data) - byteStart=0 - boxLengthValue=1 # dummy value for while loop condition + noBytes = len(data) + byteStart = 0 + boxLengthValue = 1 # dummy value for while loop condition while byteStart < noBytes and boxLengthValue != 0: - boxLengthValue, boxType, byteEnd, boxContents = getBox(data, byteStart, noBytes) + boxLengthValue, boxType, byteEnd, boxContents = \ + getBox(data, byteStart, noBytes) if boxType == 'ihdr': width, height = parse_ihdr(boxContents) elif boxType == 'colr': @@ -65,12 +71,14 @@ def parse_jp2h(data): byteStart = byteEnd return (width, height, colorspace) + def parsejp2(data): - noBytes=len(data) - byteStart=0 - boxLengthValue=1 # dummy value for while loop condition + noBytes = len(data) + byteStart = 0 + boxLengthValue = 1 # dummy value for while loop condition while byteStart < noBytes and boxLengthValue != 0: - boxLengthValue, boxType, byteEnd, boxContents = getBox(data, byteStart, noBytes) + boxLengthValue, boxType, byteEnd, boxContents = \ + getBox(data, byteStart, noBytes) if boxType == 'jp2h': width, height, colorspace = parse_jp2h(boxContents) byteStart = byteEnd @@ -85,6 +93,6 @@ def parsejp2(data): if __name__ == "__main__": import sys width, height, colorspace = parsejp2(open(sys.argv[1]).read()) - sys.stdout.write("width = %d"%width) - sys.stdout.write("height = %d"%height) - sys.stdout.write("colorspace = %s"%colorspace) + sys.stdout.write("width = %d" % width) + sys.stdout.write("height = %d" % height) + sys.stdout.write("colorspace = %s" % colorspace) diff --git a/src/tests/__init__.py b/src/tests/__init__.py index 15c9328..70c6949 100644 --- a/src/tests/__init__.py +++ b/src/tests/__init__.py @@ -7,102 +7,550 @@ from PIL import Image HERE = os.path.dirname(__file__) -#convert +set date:create +set date:modify -define png:exclude-chunk=time +# convert +set date:create +set date:modify -define png:exclude-chunk=time + +# we define some variables so that the table below can be narrower +psl = (972, 504) # --pagesize landscape +psp = (504, 972) # --pagesize portrait +isl = (756, 324) # --imgsize landscape +isp = (324, 756) # --imgsize portrait +border = (162, 270) # --border +# there is no need to have test cases with the same images with inverted +# orientation (landscape/portrait) because --pagesize and --imgsize are +# already inverted +im1 = (864, 288) # imgpx #1 => 648x216 +im2 = (1152, 576) # imgpx #2 => 864x432 +# shortcuts for fit modes +f_into = img2pdf.FitMode.into +f_fill = img2pdf.FitMode.fill +f_exact = img2pdf.FitMode.exact +f_shrink = img2pdf.FitMode.shrink +f_enlarge = img2pdf.FitMode.enlarge +layout_test_cases = [ + # psp=972x504, psl=504x972, isl=756x324, isp=324x756, border=162:270 + # --pagesize --border -a pagepdf imgpdf + # --imgsize --fit + (None, None, None, f_into, 0, (648, 216), (648, 216), # 000 + (864, 432), (864, 432)), + (None, None, None, f_into, 1, (648, 216), (648, 216), # 001 + (864, 432), (864, 432)), + (None, None, None, f_fill, 0, (648, 216), (648, 216), # 002 + (864, 432), (864, 432)), + (None, None, None, f_fill, 1, (648, 216), (648, 216), # 003 + (864, 432), (864, 432)), + (None, None, None, f_exact, 0, (648, 216), (648, 216), # 004 + (864, 432), (864, 432)), + (None, None, None, f_exact, 1, (648, 216), (648, 216), # 005 + (864, 432), (864, 432)), + (None, None, None, f_shrink, 0, (648, 216), (648, 216), # 006 + (864, 432), (864, 432)), + (None, None, None, f_shrink, 1, (648, 216), (648, 216), # 007 + (864, 432), (864, 432)), + (None, None, None, f_enlarge, 0, (648, 216), (648, 216), # 008 + (864, 432), (864, 432)), + (None, None, None, f_enlarge, 1, (648, 216), (648, 216), # 009 + (864, 432), (864, 432)), + (None, None, border, f_into, 0, (1188, 540), (648, 216), # 010 + (1404, 756), (864, 432)), + (None, None, border, f_into, 1, (1188, 540), (648, 216), # 011 + (1404, 756), (864, 432)), + (None, None, border, f_fill, 0, (1188, 540), (648, 216), # 012 + (1404, 756), (864, 432)), + (None, None, border, f_fill, 1, (1188, 540), (648, 216), # 013 + (1404, 756), (864, 432)), + (None, None, border, f_exact, 0, (1188, 540), (648, 216), # 014 + (1404, 756), (864, 432)), + (None, None, border, f_exact, 1, (1188, 540), (648, 216), # 015 + (1404, 756), (864, 432)), + (None, None, border, f_shrink, 0, (1188, 540), (648, 216), # 016 + (1404, 756), (864, 432)), + (None, None, border, f_shrink, 1, (1188, 540), (648, 216), # 017 + (1404, 756), (864, 432)), + (None, None, border, f_enlarge, 0, (1188, 540), (648, 216), # 018 + (1404, 756), (864, 432)), + (None, None, border, f_enlarge, 1, (1188, 540), (648, 216), # 019 + (1404, 756), (864, 432)), + (None, isp, None, f_into, 0, (324, 108), (324, 108), # 020 + (324, 162), (324, 162)), + (None, isp, None, f_into, 1, (324, 108), (324, 108), # 021 + (324, 162), (324, 162)), + (None, isp, None, f_fill, 0, (2268, 756), (2268, 756), # 022 + (1512, 756), (1512, 756)), + (None, isp, None, f_fill, 1, (2268, 756), (2268, 756), # 023 + (1512, 756), (1512, 756)), + (None, isp, None, f_exact, 0, (324, 756), (324, 756), # 024 + (324, 756), (324, 756)), + (None, isp, None, f_exact, 1, (324, 756), (324, 756), # 025 + (324, 756), (324, 756)), + (None, isp, None, f_shrink, 0, (324, 108), (324, 108), # 026 + (324, 162), (324, 162)), + (None, isp, None, f_shrink, 1, (324, 108), (324, 108), # 027 + (324, 162), (324, 162)), + (None, isp, None, f_enlarge, 0, (648, 216), (648, 216), # 028 + (864, 432), (864, 432)), + (None, isp, None, f_enlarge, 1, (648, 216), (648, 216), # 029 + (864, 432), (864, 432)), + (None, isp, border, f_into, 0, (864, 432), (324, 108), # 030 + (864, 486), (324, 162)), + (None, isp, border, f_into, 1, (864, 432), (324, 108), # 031 + (864, 486), (324, 162)), + (None, isp, border, f_fill, 0, (2808, 1080), (2268, 756), # 032 + (2052, 1080), (1512, 756)), + (None, isp, border, f_fill, 1, (2808, 1080), (2268, 756), # 033 + (2052, 1080), (1512, 756)), + (None, isp, border, f_exact, 0, (864, 1080), (324, 756), # 034 + (864, 1080), (324, 756)), + (None, isp, border, f_exact, 1, (864, 1080), (324, 756), # 035 + (864, 1080), (324, 756)), + (None, isp, border, f_shrink, 0, (864, 432), (324, 108), # 036 + (864, 486), (324, 162)), + (None, isp, border, f_shrink, 1, (864, 432), (324, 108), # 037 + (864, 486), (324, 162)), + (None, isp, border, f_enlarge, 0, (1188, 540), (648, 216), # 038 + (1404, 756), (864, 432)), + (None, isp, border, f_enlarge, 1, (1188, 540), (648, 216), # 039 + (1404, 756), (864, 432)), + (None, isl, None, f_into, 0, (756, 252), (756, 252), # 040 + (648, 324), (648, 324)), + (None, isl, None, f_into, 1, (756, 252), (756, 252), # 041 + (648, 324), (648, 324)), + (None, isl, None, f_fill, 0, (972, 324), (972, 324), # 042 + (756, 378), (756, 378)), + (None, isl, None, f_fill, 1, (972, 324), (972, 324), # 043 + (756, 378), (756, 378)), + (None, isl, None, f_exact, 0, (756, 324), (756, 324), # 044 + (756, 324), (756, 324)), + (None, isl, None, f_exact, 1, (756, 324), (756, 324), # 045 + (756, 324), (756, 324)), + (None, isl, None, f_shrink, 0, (648, 216), (648, 216), # 046 + (648, 324), (648, 324)), + (None, isl, None, f_shrink, 1, (648, 216), (648, 216), # 047 + (648, 324), (648, 324)), + (None, isl, None, f_enlarge, 0, (756, 252), (756, 252), # 048 + (864, 432), (864, 432)), + (None, isl, None, f_enlarge, 1, (756, 252), (756, 252), # 049 + (864, 432), (864, 432)), + # psp=972x504, psp=504x972, isl=756x324, isp=324x756, border=162:270 + # --pagesize --border -a pagepdf imgpdf + # --imgsize --fit imgpx + (None, isl, border, f_into, 0, (1296, 576), (756, 252), # 050 + (1188, 648), (648, 324)), + (None, isl, border, f_into, 1, (1296, 576), (756, 252), # 051 + (1188, 648), (648, 324)), + (None, isl, border, f_fill, 0, (1512, 648), (972, 324), # 052 + (1296, 702), (756, 378)), + (None, isl, border, f_fill, 1, (1512, 648), (972, 324), # 053 + (1296, 702), (756, 378)), + (None, isl, border, f_exact, 0, (1296, 648), (756, 324), # 054 + (1296, 648), (756, 324)), + (None, isl, border, f_exact, 1, (1296, 648), (756, 324), # 055 + (1296, 648), (756, 324)), + (None, isl, border, f_shrink, 0, (1188, 540), (648, 216), # 056 + (1188, 648), (648, 324)), + (None, isl, border, f_shrink, 1, (1188, 540), (648, 216), # 057 + (1188, 648), (648, 324)), + (None, isl, border, f_enlarge, 0, (1296, 576), (756, 252), # 058 + (1404, 756), (864, 432)), + (None, isl, border, f_enlarge, 1, (1296, 576), (756, 252), # 059 + (1404, 756), (864, 432)), + (psp, None, None, f_into, 0, (504, 972), (504, 168), # 060 + (504, 972), (504, 252)), + (psp, None, None, f_into, 1, (972, 504), (972, 324), # 061 + (972, 504), (972, 486)), + (psp, None, None, f_fill, 0, (504, 972), (2916, 972), # 062 + (504, 972), (1944, 972)), + (psp, None, None, f_fill, 1, (972, 504), (1512, 504), # 063 + (972, 504), (1008, 504)), + (psp, None, None, f_exact, 0, (504, 972), (504, 972), # 064 + (504, 972), (504, 972)), + (psp, None, None, f_exact, 1, (972, 504), (972, 504), # 065 + (972, 504), (972, 504)), + (psp, None, None, f_shrink, 0, (504, 972), (504, 168), # 066 + (504, 972), (504, 252)), + (psp, None, None, f_shrink, 1, (972, 504), (648, 216), # 067 + (972, 504), (864, 432)), + (psp, None, None, f_enlarge, 0, (504, 972), (648, 216), # 068 + (504, 972), (864, 432)), + (psp, None, None, f_enlarge, 1, (972, 504), (972, 324), # 069 + (972, 504), (972, 486)), + (psp, None, border, f_into, 0, (504, 972), (180, 60), # 070 + (504, 972), (180, 90)), + (psp, None, border, f_into, 1, (972, 504), (432, 144), # 071 + (972, 504), (360, 180)), + (psp, None, border, f_fill, 0, (504, 972), (1296, 432), # 072 + (504, 972), (864, 432)), + (psp, None, border, f_fill, 1, (972, 504), (540, 180), # 073 + (972, 504), (432, 216)), + (psp, None, border, f_exact, 0, (504, 972), (180, 432), # 074 + (504, 972), (180, 432)), + (psp, None, border, f_exact, 1, (972, 504), (432, 180), # 075 + (972, 504), (432, 180)), + (psp, None, border, f_shrink, 0, (504, 972), (180, 60), # 076 + (504, 972), (180, 90)), + (psp, None, border, f_shrink, 1, (972, 504), (432, 144), # 077 + (972, 504), (360, 180)), + (psp, None, border, f_enlarge, 0, (504, 972), (648, 216), # 078 + (504, 972), (864, 432)), + (psp, None, border, f_enlarge, 1, (972, 504), (648, 216), # 079 + (972, 504), (864, 432)), + (psp, isp, None, f_into, 0, (504, 972), (324, 108), # 080 + (504, 972), (324, 162)), + (psp, isp, None, f_into, 1, (972, 504), (324, 108), # 081 + (972, 504), (324, 162)), + (psp, isp, None, f_fill, 0, (504, 972), (2268, 756), # 082 + (504, 972), (1512, 756)), + (psp, isp, None, f_fill, 1, (972, 504), (2268, 756), # 083 + (972, 504), (1512, 756)), + (psp, isp, None, f_exact, 0, (504, 972), (324, 756), # 084 + (504, 972), (324, 756)), + (psp, isp, None, f_exact, 1, (972, 504), (324, 756), # 085 + (972, 504), (324, 756)), + (psp, isp, None, f_shrink, 0, (504, 972), (324, 108), # 086 + (504, 972), (324, 162)), + (psp, isp, None, f_shrink, 1, (972, 504), (324, 108), # 087 + (972, 504), (324, 162)), + (psp, isp, None, f_enlarge, 0, (504, 972), (648, 216), # 088 + (504, 972), (864, 432)), + (psp, isp, None, f_enlarge, 1, (972, 504), (648, 216), # 089 + (972, 504), (864, 432)), + (psp, isp, border, f_into, 0, (504, 972), (324, 108), # 090 + (504, 972), (324, 162)), + (psp, isp, border, f_into, 1, (972, 504), (324, 108), # 091 + (972, 504), (324, 162)), + (psp, isp, border, f_fill, 0, (504, 972), (2268, 756), # 092 + (504, 972), (1512, 756)), + (psp, isp, border, f_fill, 1, (972, 504), (2268, 756), # 093 + (972, 504), (1512, 756)), + (psp, isp, border, f_exact, 0, (504, 972), (324, 756), # 094 + (504, 972), (324, 756)), + (psp, isp, border, f_exact, 1, (972, 504), (324, 756), # 095 + (972, 504), (324, 756)), + (psp, isp, border, f_shrink, 0, (504, 972), (324, 108), # 096 + (504, 972), (324, 162)), + (psp, isp, border, f_shrink, 1, (972, 504), (324, 108), # 097 + (972, 504), (324, 162)), + (psp, isp, border, f_enlarge, 0, (504, 972), (648, 216), # 098 + (504, 972), (864, 432)), + (psp, isp, border, f_enlarge, 1, (972, 504), (648, 216), # 099 + (972, 504), (864, 432)), + # psp=972x504, psp=504x972, isl=756x324, isp=324x756, border=162:270 + # --pagesize --border -a pagepdf imgpdf + # --imgsize --fit imgpx + (psp, isl, None, f_into, 0, (504, 972), (756, 252), # 100 + (504, 972), (648, 324)), + (psp, isl, None, f_into, 1, (972, 504), (756, 252), # 101 + (972, 504), (648, 324)), + (psp, isl, None, f_fill, 0, (504, 972), (972, 324), # 102 + (504, 972), (756, 378)), + (psp, isl, None, f_fill, 1, (972, 504), (972, 324), # 103 + (972, 504), (756, 378)), + (psp, isl, None, f_exact, 0, (504, 972), (756, 324), # 104 + (504, 972), (756, 324)), + (psp, isl, None, f_exact, 1, (972, 504), (756, 324), # 105 + (972, 504), (756, 324)), + (psp, isl, None, f_shrink, 0, (504, 972), (648, 216), # 106 + (504, 972), (648, 324)), + (psp, isl, None, f_shrink, 1, (972, 504), (648, 216), # 107 + (972, 504), (648, 324)), + (psp, isl, None, f_enlarge, 0, (504, 972), (756, 252), # 108 + (504, 972), (864, 432)), + (psp, isl, None, f_enlarge, 1, (972, 504), (756, 252), # 109 + (972, 504), (864, 432)), + (psp, isl, border, f_into, 0, (504, 972), (756, 252), # 110 + (504, 972), (648, 324)), + (psp, isl, border, f_into, 1, (972, 504), (756, 252), # 111 + (972, 504), (648, 324)), + (psp, isl, border, f_fill, 0, (504, 972), (972, 324), # 112 + (504, 972), (756, 378)), + (psp, isl, border, f_fill, 1, (972, 504), (972, 324), # 113 + (972, 504), (756, 378)), + (psp, isl, border, f_exact, 0, (504, 972), (756, 324), # 114 + (504, 972), (756, 324)), + (psp, isl, border, f_exact, 1, (972, 504), (756, 324), # 115 + (972, 504), (756, 324)), + (psp, isl, border, f_shrink, 0, (504, 972), (648, 216), # 116 + (504, 972), (648, 324)), + (psp, isl, border, f_shrink, 1, (972, 504), (648, 216), # 117 + (972, 504), (648, 324)), + (psp, isl, border, f_enlarge, 0, (504, 972), (756, 252), # 118 + (504, 972), (864, 432)), + (psp, isl, border, f_enlarge, 1, (972, 504), (756, 252), # 119 + (972, 504), (864, 432)), + (psl, None, None, f_into, 0, (972, 504), (972, 324), # 120 + (972, 504), (972, 486)), + (psl, None, None, f_into, 1, (972, 504), (972, 324), # 121 + (972, 504), (972, 486)), + (psl, None, None, f_fill, 0, (972, 504), (1512, 504), # 122 + (972, 504), (1008, 504)), + (psl, None, None, f_fill, 1, (972, 504), (1512, 504), # 123 + (972, 504), (1008, 504)), + (psl, None, None, f_exact, 0, (972, 504), (972, 504), # 124 + (972, 504), (972, 504)), + (psl, None, None, f_exact, 1, (972, 504), (972, 504), # 125 + (972, 504), (972, 504)), + (psl, None, None, f_shrink, 0, (972, 504), (648, 216), # 126 + (972, 504), (864, 432)), + (psl, None, None, f_shrink, 1, (972, 504), (648, 216), # 127 + (972, 504), (864, 432)), + (psl, None, None, f_enlarge, 0, (972, 504), (972, 324), # 128 + (972, 504), (972, 486)), + (psl, None, None, f_enlarge, 1, (972, 504), (972, 324), # 129 + (972, 504), (972, 486)), + (psl, None, border, f_into, 0, None, None, None, None), # 130 + (psl, None, border, f_into, 1, None, None, None, None), # 131 + (psl, None, border, f_fill, 0, (972, 504), (648, 216), # 132 + (972, 504), (648, 324)), + (psl, None, border, f_fill, 1, (972, 504), (648, 216), # 133 + (972, 504), (648, 324)), + (psl, None, border, f_exact, 0, None, None, None, None), # 134 + (psl, None, border, f_exact, 1, None, None, None, None), # 135 + (psl, None, border, f_shrink, 0, None, None, None, None), # 136 + (psl, None, border, f_shrink, 1, None, None, None, None), # 137 + (psl, None, border, f_enlarge, 0, (972, 504), (648, 216), # 138 + (972, 504), (864, 432)), + (psl, None, border, f_enlarge, 1, (972, 504), (648, 216), # 139 + (972, 504), (864, 432)), + (psl, isp, None, f_into, 0, (972, 504), (324, 108), # 140 + (972, 504), (324, 162)), + (psl, isp, None, f_into, 1, (972, 504), (324, 108), # 141 + (972, 504), (324, 162)), + (psl, isp, None, f_fill, 0, (972, 504), (2268, 756), # 142 + (972, 504), (1512, 756)), + (psl, isp, None, f_fill, 1, (972, 504), (2268, 756), # 143 + (972, 504), (1512, 756)), + (psl, isp, None, f_exact, 0, (972, 504), (324, 756), # 144 + (972, 504), (324, 756)), + (psl, isp, None, f_exact, 1, (972, 504), (324, 756), # 145 + (972, 504), (324, 756)), + (psl, isp, None, f_shrink, 0, (972, 504), (324, 108), # 146 + (972, 504), (324, 162)), + (psl, isp, None, f_shrink, 1, (972, 504), (324, 108), # 147 + (972, 504), (324, 162)), + (psl, isp, None, f_enlarge, 0, (972, 504), (648, 216), # 148 + (972, 504), (864, 432)), + (psl, isp, None, f_enlarge, 1, (972, 504), (648, 216), # 149 + (972, 504), (864, 432)), + # psp=972x504, psl=504x972, isl=756x324, isp=324x756, border=162:270 + # --pagesize --border -a pagepdf imgpdf + # --imgsize --fit imgpx + (psl, isp, border, f_into, 0, (972, 504), (324, 108), # 150 + (972, 504), (324, 162)), + (psl, isp, border, f_into, 1, (972, 504), (324, 108), # 151 + (972, 504), (324, 162)), + (psl, isp, border, f_fill, 0, (972, 504), (2268, 756), # 152 + (972, 504), (1512, 756)), + (psl, isp, border, f_fill, 1, (972, 504), (2268, 756), # 153 + (972, 504), (1512, 756)), + (psl, isp, border, f_exact, 0, (972, 504), (324, 756), # 154 + (972, 504), (324, 756)), + (psl, isp, border, f_exact, 1, (972, 504), (324, 756), # 155 + (972, 504), (324, 756)), + (psl, isp, border, f_shrink, 0, (972, 504), (324, 108), # 156 + (972, 504), (324, 162)), + (psl, isp, border, f_shrink, 1, (972, 504), (324, 108), # 157 + (972, 504), (324, 162)), + (psl, isp, border, f_enlarge, 0, (972, 504), (648, 216), # 158 + (972, 504), (864, 432)), + (psl, isp, border, f_enlarge, 1, (972, 504), (648, 216), # 159 + (972, 504), (864, 432)), + (psl, isl, None, f_into, 0, (972, 504), (756, 252), # 160 + (972, 504), (648, 324)), + (psl, isl, None, f_into, 1, (972, 504), (756, 252), # 161 + (972, 504), (648, 324)), + (psl, isl, None, f_fill, 0, (972, 504), (972, 324), # 162 + (972, 504), (756, 378)), + (psl, isl, None, f_fill, 1, (972, 504), (972, 324), # 163 + (972, 504), (756, 378)), + (psl, isl, None, f_exact, 0, (972, 504), (756, 324), # 164 + (972, 504), (756, 324)), + (psl, isl, None, f_exact, 1, (972, 504), (756, 324), # 165 + (972, 504), (756, 324)), + (psl, isl, None, f_shrink, 0, (972, 504), (648, 216), # 166 + (972, 504), (648, 324)), + (psl, isl, None, f_shrink, 1, (972, 504), (648, 216), # 167 + (972, 504), (648, 324)), + (psl, isl, None, f_enlarge, 0, (972, 504), (756, 252), # 168 + (972, 504), (864, 432)), + (psl, isl, None, f_enlarge, 1, (972, 504), (756, 252), # 169 + (972, 504), (864, 432)), + (psl, isl, border, f_into, 0, (972, 504), (756, 252), # 170 + (972, 504), (648, 324)), + (psl, isl, border, f_into, 1, (972, 504), (756, 252), # 171 + (972, 504), (648, 324)), + (psl, isl, border, f_fill, 0, (972, 504), (972, 324), # 172 + (972, 504), (756, 378)), + (psl, isl, border, f_fill, 1, (972, 504), (972, 324), # 173 + (972, 504), (756, 378)), + (psl, isl, border, f_exact, 0, (972, 504), (756, 324), # 174 + (972, 504), (756, 324)), + (psl, isl, border, f_exact, 1, (972, 504), (756, 324), # 175 + (972, 504), (756, 324)), + (psl, isl, border, f_shrink, 0, (972, 504), (648, 216), # 176 + (972, 504), (648, 324)), + (psl, isl, border, f_shrink, 1, (972, 504), (648, 216), # 177 + (972, 504), (648, 324)), + (psl, isl, border, f_enlarge, 0, (972, 504), (756, 252), # 178 + (972, 504), (864, 432)), + (psl, isl, border, f_enlarge, 1, (972, 504), (756, 252), # 179 + (972, 504), (864, 432)), +] + def test_suite(): class TestImg2Pdf(unittest.TestCase): pass - for test_name in os.listdir(os.path.join(HERE, "input")): + for i, (psopt, isopt, border, fit, ao, pspdf1, ispdf1, + pspdf2, ispdf2) in enumerate(layout_test_cases): + if isopt is not None: + isopt = ((img2pdf.ImgSize.abs, isopt[0]), + (img2pdf.ImgSize.abs, isopt[1])) + + def layout_handler( + self, psopt, isopt, border, fit, ao, pspdf, ispdf, im): + layout_fun = img2pdf.get_layout_fun(psopt, isopt, border, fit, ao) + try: + pwpdf, phpdf, iwpdf, ihpdf = \ + layout_fun(im[0], im[1], (img2pdf.default_dpi, + img2pdf.default_dpi)) + self.assertEqual((pwpdf, phpdf), pspdf) + self.assertEqual((iwpdf, ihpdf), ispdf) + except img2pdf.NegativeDimensionException: + self.assertEqual(None, pspdf) + self.assertEqual(None, ispdf) + + def layout_handler_im1(self, psopt=psopt, isopt=isopt, border=border, + fit=fit, ao=ao, pspdf=pspdf1, ispdf=ispdf1): + layout_handler(self, psopt, isopt, border, fit, ao, pspdf, ispdf, + im1) + setattr(TestImg2Pdf, "test_layout_%03d_im1" % i, layout_handler_im1) + + def layout_handler_im2(self, psopt=psopt, isopt=isopt, border=border, + fit=fit, ao=ao, pspdf=pspdf2, ispdf=ispdf2): + layout_handler(self, psopt, isopt, border, fit, ao, pspdf, ispdf, + im2) + setattr(TestImg2Pdf, "test_layout_%03d_im2" % i, layout_handler_im2) + + files = os.listdir(os.path.join(HERE, "input")) + for with_pdfrw, test_name in [(a, b) for a in [True, False] + for b in files]: inputf = os.path.join(HERE, "input", test_name) if not os.path.isfile(inputf): continue outputf = os.path.join(HERE, "output", test_name+".pdf") assert os.path.isfile(outputf) - def handle(self, f=inputf, out=outputf): + + def handle(self, f=inputf, out=outputf, with_pdfrw=with_pdfrw): with open(f, "rb") as inf: orig_imgdata = inf.read() - pdf = img2pdf.convert([f], nodate=True) - imgdata = b"" - instream = False - imgobj = False - colorspace = None - imgfilter = None - width = None - height = None - length = None - # ugly workaround to parse the created pdf - for line in pdf.split(b'\n'): - if instream: - if line == b"endstream": - break - else: - imgdata += line + b'\n' - else: - if imgobj and line == b"stream": - instream = True - elif b"/Subtype /Image" in line: - imgobj = True - elif b"/Width" in line: - width = int(line.split()[-1]) - elif b"/Height" in line: - height = int(line.split()[-1]) - elif b"/Length" in line: - length = int(line.split()[-1]) - elif b"/Filter" in line: - imgfilter = line.split()[-2] - elif b"/ColorSpace" in line: - colorspace = line.split()[-1] - # remove trailing \n - imgdata = imgdata[:-1] - # test if the length field is correct - self.assertEqual(len(imgdata), length) + output = img2pdf.convert(orig_imgdata, nodate=True, + with_pdfrw=with_pdfrw) + from io import StringIO, BytesIO + from pdfrw import PdfReader, PdfName, PdfWriter + from pdfrw.py23_diffs import convert_load, convert_store + x = PdfReader(StringIO(convert_load(output))) + self.assertEqual(sorted(x.keys()), [PdfName.Info, PdfName.Root, + PdfName.Size]) + self.assertEqual(x.Size, '7') + self.assertEqual(x.Info, {}) + self.assertEqual(sorted(x.Root.keys()), [PdfName.Pages, + PdfName.Type]) + self.assertEqual(x.Root.Type, PdfName.Catalog) + self.assertEqual(sorted(x.Root.Pages.keys()), + [PdfName.Count, PdfName.Kids, PdfName.Type]) + self.assertEqual(x.Root.Pages.Count, '1') + self.assertEqual(x.Root.Pages.Type, PdfName.Pages) + self.assertEqual(len(x.Root.Pages.Kids), 1) + self.assertEqual(sorted(x.Root.Pages.Kids[0].keys()), + [PdfName.Contents, PdfName.MediaBox, + PdfName.Parent, PdfName.Resources, PdfName.Type]) + self.assertEqual(x.Root.Pages.Kids[0].MediaBox, + ['0', '0', '115', '48']) + self.assertEqual(x.Root.Pages.Kids[0].Parent, x.Root.Pages) + self.assertEqual(x.Root.Pages.Kids[0].Type, PdfName.Page) + self.assertEqual(x.Root.Pages.Kids[0].Resources.keys(), + [PdfName.XObject]) + self.assertEqual(x.Root.Pages.Kids[0].Resources.XObject.keys(), + [PdfName.Im0]) + self.assertEqual(x.Root.Pages.Kids[0].Contents.keys(), + [PdfName.Length]) + self.assertEqual(x.Root.Pages.Kids[0].Contents.Length, + str(len(x.Root.Pages.Kids[0].Contents.stream))) + self.assertEqual(x.Root.Pages.Kids[0].Contents.stream, + "q\n115.0000 0 0 48.0000 0.0000 0.0000 cm\n/Im0 " + "Do\nQ") + + imgprops = x.Root.Pages.Kids[0].Resources.XObject.Im0 + # test if the filter is valid: - self.assertIn(imgfilter, [b"/DCTDecode", b"/JPXDecode", b"/FlateDecode"]) + self.assertIn( + imgprops.Filter, [[PdfName.DCTDecode], [PdfName.JPXDecode], + [PdfName.FlateDecode]]) # test if the colorspace is valid - self.assertIn(colorspace, [b"/DeviceGray", b"/DeviceRGB", b"/DeviceCMYK"]) + self.assertIn( + imgprops.ColorSpace, [PdfName.DeviceGray, PdfName.DeviceRGB, + PdfName.DeviceCMYK]) # test if the image has correct size orig_img = Image.open(f) - self.assertEqual(width, orig_img.size[0]) - self.assertEqual(height, orig_img.size[1]) + self.assertEqual(imgprops.Width, str(orig_img.size[0])) + self.assertEqual(imgprops.Height, str(orig_img.size[1])) # if the input file is a jpeg then it should've been copied # verbatim into the PDF - if imgfilter in [b"/DCTDecode", b"/JPXDecode"]: - self.assertEqual(imgdata, orig_imgdata) - elif imgfilter == b"/FlateDecode": + if imgprops.Filter in [[PdfName.DCTDecode], [PdfName.JPXDecode]]: + self.assertEqual( + x.Root.Pages.Kids[0].Resources.XObject.Im0.stream, + convert_load(orig_imgdata)) + elif imgprops.Filter == [PdfName.FlateDecode]: # otherwise, the data is flate encoded and has to be equal to # the pixel data of the input image - imgdata = zlib.decompress(imgdata) - if colorspace == b"/DeviceGray": + imgdata = zlib.decompress( + convert_store( + x.Root.Pages.Kids[0].Resources.XObject.Im0.stream)) + colorspace = imgprops.ColorSpace + if colorspace == PdfName.DeviceGray: colorspace = 'L' - elif colorspace == b"/DeviceRGB": + elif colorspace == PdfName.DeviceRGB: colorspace = 'RGB' - elif colorspace == b"/DeviceCMYK": + elif colorspace == PdfName.DeviceCMYK: colorspace = 'CMYK' else: raise Exception("invalid colorspace") - im = Image.frombytes(colorspace, (width, height), imgdata) + im = Image.frombytes(colorspace, (int(imgprops.Width), + int(imgprops.Height)), + imgdata) if orig_img.mode == '1': orig_img = orig_img.convert("L") elif orig_img.mode not in ("RGB", "L", "CMYK", "CMYK;I"): orig_img = orig_img.convert("RGB") self.assertEqual(im.tobytes(), orig_img.tobytes()) - # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the close() method + # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have + # the close() method try: im.close() except AttributeError: pass - # lastly, make sure that the generated pdf matches bit by bit the - # expected pdf - with open(out, "rb") as outf: - out = outf.read() - self.assertEqual(pdf, out) - # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the close() method + # now use pdfrw to parse and then write out both pdfs and check the + # result for equality + y = PdfReader(out) + outx = BytesIO() + outy = BytesIO() + xwriter = PdfWriter() + ywriter = PdfWriter() + xwriter.trailer = x + ywriter.trailer = y + xwriter.write(outx) + ywriter.write(outy) + self.assertEqual(outx.getvalue(), outy.getvalue()) + # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the + # close() method try: orig_img.close() except AttributeError: pass - setattr(TestImg2Pdf, "test_%s"%test_name, handle) + if with_pdfrw: + setattr(TestImg2Pdf, "test_%s_with_pdfrw" % test_name, handle) + else: + setattr(TestImg2Pdf, "test_%s_without_pdfrw" % test_name, handle) return unittest.TestSuite(( unittest.makeSuite(TestImg2Pdf), diff --git a/src/tests/output/CMYK.jpg.pdf b/src/tests/output/CMYK.jpg.pdf index 2a00022..bfe67f3 100644 Binary files a/src/tests/output/CMYK.jpg.pdf and b/src/tests/output/CMYK.jpg.pdf differ diff --git a/src/tests/output/CMYK.tif.pdf b/src/tests/output/CMYK.tif.pdf index 54c0b4e..b00586b 100644 Binary files a/src/tests/output/CMYK.tif.pdf and b/src/tests/output/CMYK.tif.pdf differ diff --git a/src/tests/output/normal.jpg.pdf b/src/tests/output/normal.jpg.pdf index 1b891a0..87d2645 100644 Binary files a/src/tests/output/normal.jpg.pdf and b/src/tests/output/normal.jpg.pdf differ diff --git a/src/tests/output/normal.png.pdf b/src/tests/output/normal.png.pdf index 5538634..2628c5d 100644 Binary files a/src/tests/output/normal.png.pdf and b/src/tests/output/normal.png.pdf differ