Compare commits
30 commits
Author | SHA1 | Date | |
---|---|---|---|
bb188a3eaf | |||
69c3ac6b25 | |||
dffc0dbe16 | |||
|
b91007fef8 | ||
a8cb28ba31 | |||
c6d12d6239 | |||
59132f20f8 | |||
3ba7d17e15 | |||
43c16ac369 | |||
08c4d9beec | |||
9e6eba9f40 | |||
5aeb628506 | |||
b6dbfdb481 | |||
23436114f8 | |||
2d5e4e3cb7 | |||
5e515abb6f | |||
a2e2998fb1 | |||
14948e7ba8 | |||
bcfdf8b54e | |||
9f74740c95 | |||
cbc3d50c63 | |||
4b549592bf | |||
5540365cfd | |||
819b366bf5 | |||
cc8c708295 | |||
fb9537d8b7 | |||
7678435eb7 | |||
ba7a360866 | |||
7f0bf47ff3 | |||
|
5cd0918d50 |
8 changed files with 626 additions and 409 deletions
28
CHANGES.rst
28
CHANGES.rst
|
@ -2,6 +2,34 @@
|
|||
CHANGES
|
||||
=======
|
||||
|
||||
0.6.1 (2025-04-27)
|
||||
------------------
|
||||
|
||||
- testsuite fixes
|
||||
|
||||
0.6.0 (2025-02-15)
|
||||
------------------
|
||||
|
||||
- Add support for JBIG2 (generic coding)
|
||||
- Add convert_to_docobject() broken out from convert()
|
||||
- Add pil_get_dpi() broken out from get_imgmetadata()
|
||||
|
||||
0.5.1 (2023-11-26)
|
||||
------------------
|
||||
|
||||
- no default ICC profile location for PDF/A-1b on Windows
|
||||
- workaround for PNG input without dpi units but non-square dpi aspect ratio
|
||||
|
||||
0.5.0 (2023-10-28)
|
||||
------------------
|
||||
|
||||
- support MIFF for 16 bit CMYK input
|
||||
- accept pathlib.Path objects as input
|
||||
- don't store RGB ICC profiles from bilevel or grayscale TIFF, PNG and JPEG
|
||||
- thumbnails are no longer included by default and --include-thumbnails has to
|
||||
be used if you want them
|
||||
- support for pikepdf (>= 6.2.0)
|
||||
|
||||
0.4.4 (2022-04-07)
|
||||
------------------
|
||||
|
||||
|
|
55
HACKING
55
HACKING
|
@ -27,6 +27,57 @@ Making a new release
|
|||
|
||||
- Build and upload to pypi:
|
||||
|
||||
$ rm dist/*
|
||||
$ rm -rf dist/*
|
||||
$ python3 setup.py sdist
|
||||
$ twine upload --sign dist/*
|
||||
$ twine upload dist/*
|
||||
|
||||
- Push everything to git forge
|
||||
|
||||
$ git push
|
||||
|
||||
- Push to github
|
||||
|
||||
$ git push github
|
||||
|
||||
- Obtain img2pdf.exe from appveyor:
|
||||
|
||||
https://ci.appveyor.com/project/josch/img2pdf/
|
||||
|
||||
- Create new release:
|
||||
|
||||
https://gitlab.mister-muffin.de/josch/img2pdf/releases/new
|
||||
|
||||
Using debbisect to find regressions
|
||||
-----------------------------------
|
||||
|
||||
$ debbisect --cache=./cache --depends="git,ca-certificates,python3,
|
||||
ghostscript,imagemagick,mupdf-tools,poppler-utils,python3-pil,
|
||||
python3-pytest,python3-numpy,python3-scipy,python3-pikepdf" \
|
||||
--verbose 2023-09-16 2023-10-24 \
|
||||
'chroot "$1" sh -c "
|
||||
git clone https://gitlab.mister-muffin.de/josch/img2pdf.git
|
||||
&& cd img2pdf
|
||||
&& pytest 'src/img2pdf_test.py::test_jpg_2000_rgba8[internal]"'
|
||||
|
||||
Using debbisect cache
|
||||
---------------------
|
||||
|
||||
$ mmdebstrap --variant=apt --aptopt='Acquire::Check-Valid-Until "false"' \
|
||||
--include=git,ca-certificates,python3,ghostscript,imagemagick \
|
||||
--include=mupdf-tools,poppler-utils,python3-pil,python3-pytest \
|
||||
--include=python3-numpy,python3-scipy,python3-pikepdf \
|
||||
--hook-dir=/usr/share/mmdebstrap/hooks/file-mirror-automount \
|
||||
--setup-hook='mkdir -p "$1/home/josch/git/devscripts/cache/pool/"' \
|
||||
--setup-hook='mount -o ro,bind /home/josch/git/devscripts/cache/pool/ "$1/home/josch/git/devscripts/cache/pool/"' \
|
||||
--chrooted-customize-hook=bash
|
||||
unstable /dev/null
|
||||
file:///home/josch/git/devscripts/cache/archive/debian/20231022T090139Z/
|
||||
|
||||
Bisecting imagemagick
|
||||
---------------------
|
||||
|
||||
$ git clean -fdx && git reset --hard
|
||||
$ ./configure --prefix=$(pwd)/prefix
|
||||
$ make -j$(nproc)
|
||||
$ make install
|
||||
$ LD_LIBRARY_PATH=$(pwd)/prefix/lib prefix/bin/compare ...
|
||||
|
|
34
README.md
34
README.md
|
@ -27,18 +27,20 @@ software, because the raw pixel data never has to be loaded into memory.
|
|||
The following table shows how img2pdf handles different input depending on the
|
||||
input file format and image color space.
|
||||
|
||||
| Format | Colorspace | Result |
|
||||
| ------------------------------------- | ------------------------------ | ------------- |
|
||||
| JPEG | any | direct |
|
||||
| JPEG2000 | any | direct |
|
||||
| PNG (non-interlaced, no transparency) | any | direct |
|
||||
| TIFF (CCITT Group 4) | monochrome | direct |
|
||||
| any | any except CMYK and monochrome | PNG Paeth |
|
||||
| any | monochrome | CCITT Group 4 |
|
||||
| any | CMYK | flate |
|
||||
| Format | Colorspace | Result |
|
||||
| ------------------------------------- | ------------------------------------ | ------------- |
|
||||
| JPEG | any | direct |
|
||||
| JPEG2000 | any | direct |
|
||||
| PNG (non-interlaced, no transparency) | any | direct |
|
||||
| TIFF (CCITT Group 4) | 1-bit monochrome | direct |
|
||||
| JBIG2 (single-page generic coding) | 1-bit monochrome | direct |
|
||||
| any | any except CMYK and 1-bit monochrome | PNG Paeth |
|
||||
| any | 1-bit monochrome | CCITT Group 4 |
|
||||
| any | CMYK | flate |
|
||||
|
||||
For JPEG, JPEG2000, non-interlaced PNG and TIFF images with CCITT Group 4
|
||||
encoded data, img2pdf directly embeds the image data into the PDF without
|
||||
For JPEG, JPEG2000, non-interlaced PNG, TIFF images with CCITT Group 4
|
||||
encoded data, and JBIG2 with single-page generic coding (e.g. using `jbig2enc`),
|
||||
img2pdf directly embeds the image data into the PDF without
|
||||
re-encoding it. It thus treats the PDF format merely as a container format for
|
||||
the image data. In these cases, img2pdf only increases the filesize by the size
|
||||
of the PDF container (typically around 500 to 700 bytes). Since data is only
|
||||
|
@ -47,7 +49,7 @@ solutions for these input formats.
|
|||
|
||||
For all other input types, img2pdf first has to transform the pixel data to
|
||||
make it compatible with PDF. In most cases, the PNG Paeth filter is applied to
|
||||
the pixel data. For monochrome input, CCITT Group 4 is used instead. Only for
|
||||
the pixel data. For 1-bit monochrome input, CCITT Group 4 is used instead. Only for
|
||||
CMYK input no filter is applied before finally applying flate compression.
|
||||
|
||||
Usage
|
||||
|
@ -65,6 +67,12 @@ The detailed documentation can be accessed by running:
|
|||
|
||||
$ img2pdf --help
|
||||
|
||||
With no command line arguments supplied, img2pdf will read a single image from
|
||||
standard input and write the resulting PDF to standard output. Here is an
|
||||
example for how to scan directly to PDF using scanimage(1) from SANE:
|
||||
|
||||
$ scanimage --mode=Color --resolution=300 | pnmtojpeg -quality 90 | img2pdf > scan.pdf
|
||||
|
||||
Bugs
|
||||
----
|
||||
|
||||
|
@ -118,7 +126,7 @@ You can then test the converter using:
|
|||
$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
|
||||
|
||||
If you don't want to setup Python on Windows, then head to the
|
||||
[releases](/josch/img2pdf/releases) section and download the latest
|
||||
[releases](https://gitlab.mister-muffin.de/josch/img2pdf/releases) section and download the latest
|
||||
`img2pdf.exe`.
|
||||
|
||||
GUI
|
||||
|
|
2
setup.py
2
setup.py
|
@ -1,7 +1,7 @@
|
|||
import sys
|
||||
from setuptools import setup
|
||||
|
||||
VERSION = "0.4.4"
|
||||
VERSION = "0.6.1"
|
||||
|
||||
INSTALL_REQUIRES = (
|
||||
"Pillow",
|
||||
|
|
388
src/img2pdf.py
388
src/img2pdf.py
|
@ -22,7 +22,7 @@ import sys
|
|||
import os
|
||||
import zlib
|
||||
import argparse
|
||||
from PIL import Image, TiffImagePlugin, GifImagePlugin, ImageCms
|
||||
from PIL import Image, TiffImagePlugin, GifImagePlugin, ImageCms, ExifTags
|
||||
|
||||
if hasattr(GifImagePlugin, "LoadingStrategy"):
|
||||
# Pillow 9.0.0 started emitting all frames but the first as RGB instead of
|
||||
|
@ -62,7 +62,7 @@ try:
|
|||
except ImportError:
|
||||
have_pikepdf = False
|
||||
|
||||
__version__ = "0.4.4"
|
||||
__version__ = "0.6.1"
|
||||
default_dpi = 96.0
|
||||
papersizes = {
|
||||
"letter": "8.5inx11in",
|
||||
|
@ -128,7 +128,7 @@ PageOrientation = Enum("PageOrientation", "portrait landscape")
|
|||
Colorspace = Enum("Colorspace", "RGB RGBA L LA 1 CMYK CMYK;I P PA other")
|
||||
|
||||
ImageFormat = Enum(
|
||||
"ImageFormat", "JPEG JPEG2000 CCITTGroup4 PNG GIF TIFF MPO MIFF other"
|
||||
"ImageFormat", "JPEG JPEG2000 CCITTGroup4 PNG GIF TIFF MPO MIFF JBIG2 other"
|
||||
)
|
||||
|
||||
PageMode = Enum("PageMode", "none outlines thumbs")
|
||||
|
@ -918,6 +918,11 @@ class pdfdoc(object):
|
|||
self.output_version = "1.5" # jpeg2000 needs pdf 1.5
|
||||
elif imgformat is ImageFormat.CCITTGroup4:
|
||||
ofilter = [PdfName.CCITTFaxDecode]
|
||||
elif imgformat is ImageFormat.JBIG2:
|
||||
ofilter = PdfName.JBIG2Decode
|
||||
# JBIG2Decode requires PDF 1.4
|
||||
if self.output_version < "1.4":
|
||||
self.output_version = "1.4"
|
||||
else:
|
||||
ofilter = PdfName.FlateDecode
|
||||
|
||||
|
@ -1075,7 +1080,7 @@ class pdfdoc(object):
|
|||
self.tostream(stream)
|
||||
return stream.getvalue()
|
||||
|
||||
def tostream(self, outputstream):
|
||||
def finalize(self):
|
||||
if self.engine == Engine.pikepdf:
|
||||
PdfArray = pikepdf.Array
|
||||
PdfDict = pikepdf.Dictionary
|
||||
|
@ -1267,7 +1272,9 @@ class pdfdoc(object):
|
|||
self.writer.addobj(metadata)
|
||||
self.writer.addobj(iccstream)
|
||||
|
||||
# now write out the PDF
|
||||
def tostream(self, outputstream):
|
||||
# write out the PDF
|
||||
# this assumes that finalize() has been invoked beforehand by the caller
|
||||
if self.engine == Engine.pikepdf:
|
||||
kwargs = {}
|
||||
if pikepdf.__version__ >= "6.2.0":
|
||||
|
@ -1276,6 +1283,8 @@ class pdfdoc(object):
|
|||
outputstream, min_version=self.output_version, linearize=True, **kwargs
|
||||
)
|
||||
elif self.engine == Engine.pdfrw:
|
||||
from pdfrw import PdfName, PdfArray
|
||||
|
||||
self.writer.trailer.Info = self.writer.docinfo
|
||||
# setting the version attribute of the pdfrw PdfWriter object will
|
||||
# influence the behaviour of the write() function
|
||||
|
@ -1295,51 +1304,27 @@ class pdfdoc(object):
|
|||
raise ValueError("unknown engine: %s" % self.engine)
|
||||
|
||||
|
||||
def get_imgmetadata(
|
||||
imgdata, imgformat, default_dpi, colorspace, rawdata=None, rotreq=None
|
||||
):
|
||||
if imgformat == ImageFormat.JPEG2000 and rawdata is not None and imgdata is None:
|
||||
# this codepath gets called if the PIL installation is not able to
|
||||
# handle JPEG2000 files
|
||||
imgwidthpx, imgheightpx, ics, hdpi, vdpi, channels, bpp = jp2.parse(rawdata)
|
||||
def pil_get_dpi(imgdata, imgformat, default_dpi):
|
||||
ndpi = imgdata.info.get("dpi")
|
||||
if ndpi is None:
|
||||
# the PNG plugin of PIL adds the undocumented "aspect" field instead of
|
||||
# the "dpi" field if the PNG pHYs chunk unit is not set to meters
|
||||
if imgformat == ImageFormat.PNG and imgdata.info.get("aspect") is not None:
|
||||
aspect = imgdata.info["aspect"]
|
||||
# make sure not to go below the default dpi
|
||||
if aspect[0] > aspect[1]:
|
||||
ndpi = (default_dpi * aspect[0] / aspect[1], default_dpi)
|
||||
else:
|
||||
ndpi = (default_dpi, default_dpi * aspect[1] / aspect[0])
|
||||
else:
|
||||
ndpi = (default_dpi, default_dpi)
|
||||
|
||||
if hdpi is None:
|
||||
hdpi = default_dpi
|
||||
if vdpi is None:
|
||||
vdpi = default_dpi
|
||||
ndpi = (hdpi, vdpi)
|
||||
else:
|
||||
imgwidthpx, imgheightpx = imgdata.size
|
||||
|
||||
ndpi = imgdata.info.get("dpi", (default_dpi, default_dpi))
|
||||
# In python3, the returned dpi value for some tiff images will
|
||||
# not be an integer but a float. To make the behaviour of
|
||||
# img2pdf the same between python2 and python3, we convert that
|
||||
# float into an integer by rounding.
|
||||
# Search online for the 72.009 dpi problem for more info.
|
||||
ndpi = (int(round(ndpi[0])), int(round(ndpi[1])))
|
||||
ics = imgdata.mode
|
||||
|
||||
# GIF and PNG files with transparency are supported
|
||||
if imgformat in [ImageFormat.PNG, ImageFormat.GIF, ImageFormat.JPEG2000] and (
|
||||
ics in ["RGBA", "LA"] or "transparency" in imgdata.info
|
||||
):
|
||||
# Must check the IHDR chunk for the bit depth, because PIL would lossily
|
||||
# convert 16-bit RGBA/LA images to 8-bit.
|
||||
if imgformat == ImageFormat.PNG and rawdata is not None:
|
||||
depth = rawdata[24]
|
||||
if depth > 8:
|
||||
logger.warning("Image with transparency and a bit depth of %d." % depth)
|
||||
logger.warning("This is unsupported due to PIL limitations.")
|
||||
logger.warning(
|
||||
"If you accept a lossy conversion, you can manually convert "
|
||||
"your images to 8 bit using `convert -depth 8` from imagemagick"
|
||||
)
|
||||
raise AlphaChannelError(
|
||||
"Refusing to work with multiple >8bit channels."
|
||||
)
|
||||
elif ics in ["LA", "PA", "RGBA"] or "transparency" in imgdata.info:
|
||||
raise AlphaChannelError("This function must not be called on images with alpha")
|
||||
# In python3, the returned dpi value for some tiff images will
|
||||
# not be an integer but a float. To make the behaviour of
|
||||
# img2pdf the same between python2 and python3, we convert that
|
||||
# float into an integer by rounding.
|
||||
# Search online for the 72.009 dpi problem for more info.
|
||||
ndpi = (int(round(ndpi[0])), int(round(ndpi[1])))
|
||||
|
||||
# Since commit 07a96209597c5e8dfe785c757d7051ce67a980fb or release 4.1.0
|
||||
# Pillow retrieves the DPI from EXIF if it cannot find the DPI in the JPEG
|
||||
|
@ -1356,11 +1341,112 @@ def get_imgmetadata(
|
|||
imgdata.tag_v2.get(TiffImagePlugin.Y_RESOLUTION, default_dpi),
|
||||
)
|
||||
|
||||
return ndpi
|
||||
|
||||
|
||||
def get_imgmetadata(
|
||||
imgdata, imgformat, default_dpi, colorspace, rawdata=None, rotreq=None
|
||||
):
|
||||
if imgformat == ImageFormat.JPEG2000 and rawdata is not None and imgdata is None:
|
||||
# this codepath gets called if the PIL installation is not able to
|
||||
# handle JPEG2000 files
|
||||
imgwidthpx, imgheightpx, ics, hdpi, vdpi, channels, bpp = jp2.parse(rawdata)
|
||||
|
||||
if hdpi is None:
|
||||
hdpi = default_dpi
|
||||
if vdpi is None:
|
||||
vdpi = default_dpi
|
||||
ndpi = (hdpi, vdpi)
|
||||
elif imgformat == ImageFormat.JBIG2:
|
||||
imgwidthpx, imgheightpx, xres, yres = struct.unpack(">IIII", rawdata[24:40])
|
||||
INCH_PER_METER = 39.370079
|
||||
if xres == 0:
|
||||
hdpi = default_dpi
|
||||
elif xres < 1000:
|
||||
# If xres is very small, it's likely accidentally expressed in dpi instead
|
||||
# of dpm. See e.g. https://github.com/agl/jbig2enc/issues/86
|
||||
hdpi = xres
|
||||
else:
|
||||
hdpi = int(float(xres) / INCH_PER_METER)
|
||||
if yres == 0:
|
||||
vdpi = default_dpi
|
||||
elif yres < 1000:
|
||||
vdpi = yres
|
||||
else:
|
||||
vdpi = int(float(yres) / INCH_PER_METER)
|
||||
ndpi = (hdpi, vdpi)
|
||||
ics = "1"
|
||||
else:
|
||||
imgwidthpx, imgheightpx = imgdata.size
|
||||
ndpi = pil_get_dpi(imgdata, imgformat, default_dpi)
|
||||
ics = imgdata.mode
|
||||
|
||||
logger.debug("input dpi = %d x %d", *ndpi)
|
||||
|
||||
# GIF and PNG files with transparency are supported
|
||||
if imgformat in [ImageFormat.PNG, ImageFormat.GIF, ImageFormat.JPEG2000] and (
|
||||
ics in ["RGBA", "LA"]
|
||||
or (imgdata is not None and "transparency" in imgdata.info)
|
||||
):
|
||||
# Must check the IHDR chunk for the bit depth, because PIL would lossily
|
||||
# convert 16-bit RGBA/LA images to 8-bit.
|
||||
if imgformat == ImageFormat.PNG and rawdata is not None:
|
||||
depth = rawdata[24]
|
||||
if depth > 8:
|
||||
logger.warning("Image with transparency and a bit depth of %d." % depth)
|
||||
logger.warning("This is unsupported due to PIL limitations.")
|
||||
logger.warning(
|
||||
"If you accept a lossy conversion, you can manually convert "
|
||||
"your images to 8 bit using `convert -depth 8` from imagemagick"
|
||||
)
|
||||
raise AlphaChannelError(
|
||||
"Refusing to work with multiple >8bit channels."
|
||||
)
|
||||
elif ics in ["LA", "PA", "RGBA"] or (
|
||||
imgdata is not None and "transparency" in imgdata.info
|
||||
):
|
||||
raise AlphaChannelError("This function must not be called on images with alpha")
|
||||
|
||||
rotation = 0
|
||||
if rotreq in (None, Rotation.auto, Rotation.ifvalid):
|
||||
if hasattr(imgdata, "_getexif") and imgdata._getexif() is not None:
|
||||
if hasattr(imgdata, "getexif") and imgdata.getexif() is not None:
|
||||
exif_dict = imgdata.getexif()
|
||||
o_key = ExifTags.Base.Orientation.value # 274 rsp. 0x112
|
||||
if exif_dict and o_key in exif_dict:
|
||||
# Detailed information on EXIF rotation tags:
|
||||
# http://impulseadventure.com/photo/exif-orientation.html
|
||||
value = exif_dict[o_key]
|
||||
if value == 1:
|
||||
rotation = 0
|
||||
elif value == 6:
|
||||
rotation = 90
|
||||
elif value == 3:
|
||||
rotation = 180
|
||||
elif value == 8:
|
||||
rotation = 270
|
||||
elif value in (2, 4, 5, 7):
|
||||
if rotreq == Rotation.ifvalid:
|
||||
logger.warning(
|
||||
"Unsupported flipped rotation mode (%d): use "
|
||||
"--rotation=ifvalid or "
|
||||
"rotation=img2pdf.Rotation.ifvalid to ignore",
|
||||
value,
|
||||
)
|
||||
else:
|
||||
raise ExifOrientationError(
|
||||
"Unsupported flipped rotation mode (%d): use "
|
||||
"--rotation=ifvalid or "
|
||||
"rotation=img2pdf.Rotation.ifvalid to ignore" % value
|
||||
)
|
||||
else:
|
||||
if rotreq == Rotation.ifvalid:
|
||||
logger.warning("Invalid rotation (%d)", value)
|
||||
else:
|
||||
raise ExifOrientationError(
|
||||
"Invalid rotation (%d): use --rotation=ifvalid "
|
||||
"or rotation=img2pdf.Rotation.ifvalid to ignore" % value
|
||||
)
|
||||
elif hasattr(imgdata, "_getexif") and imgdata._getexif() is not None:
|
||||
for tag, value in imgdata._getexif().items():
|
||||
if TAGS.get(tag, tag) == "Orientation":
|
||||
# Detailed information on EXIF rotation tags:
|
||||
|
@ -1395,6 +1481,7 @@ def get_imgmetadata(
|
|||
"Invalid rotation (%d): use --rotation=ifvalid "
|
||||
"or rotation=img2pdf.Rotation.ifvalid to ignore" % value
|
||||
)
|
||||
|
||||
elif rotreq in (Rotation.none, Rotation["0"]):
|
||||
rotation = 0
|
||||
elif rotreq == Rotation["90"]:
|
||||
|
@ -1443,7 +1530,7 @@ def get_imgmetadata(
|
|||
logger.debug("input colorspace = %s", color.name)
|
||||
|
||||
iccp = None
|
||||
if "icc_profile" in imgdata.info:
|
||||
if imgdata is not None and "icc_profile" in imgdata.info:
|
||||
iccp = imgdata.info.get("icc_profile")
|
||||
# GIMP saves bilevel TIFF images and palette PNG images with only black and
|
||||
# white in the palette with an RGB ICC profile which is useless
|
||||
|
@ -1481,22 +1568,16 @@ def get_imgmetadata(
|
|||
# SmartAlbums old version (found 2.2.6) exports JPG with only 1 compone
|
||||
# with an RGB ICC profile which is useless.
|
||||
# This produces an error in Adobe Acrobat, so we ignore it with a warning.
|
||||
# Update: Found another case, the JPG is created by Adobe PhotoShop, so we
|
||||
# don't check software anymore.
|
||||
if iccp is not None and (
|
||||
(color == Colorspace["L"] and imgformat == ImageFormat.JPEG)
|
||||
):
|
||||
exifsoft = None
|
||||
if hasattr(imgdata, "_getexif") and imgdata._getexif() is not None:
|
||||
for tag, value in imgdata._getexif().items():
|
||||
if TAGS.get(tag, tag) == "Software":
|
||||
exifsoft = value
|
||||
with io.BytesIO(iccp) as f:
|
||||
prf = ImageCms.ImageCmsProfile(f)
|
||||
if (prf.profile.model and "sRGB" in prf.profile.model) and (
|
||||
exifsoft and "SmartAlbums" in exifsoft
|
||||
):
|
||||
logger.warning(
|
||||
"Ignoring RGB ICC profile in Grayscale JPG created by SmartAlbums"
|
||||
)
|
||||
|
||||
if prf.profile.xcolor_space not in ("GRAY"):
|
||||
logger.warning("Ignoring non-GRAY ICC profile in Grayscale JPG")
|
||||
iccp = None
|
||||
|
||||
logger.debug("width x height = %dpx x %dpx", imgwidthpx, imgheightpx)
|
||||
|
@ -1618,6 +1699,7 @@ miff_re = re.compile(
|
|||
re.VERBOSE,
|
||||
)
|
||||
|
||||
|
||||
# https://imagemagick.org/script/miff.php
|
||||
# turn off black formatting until python 3.10 is available on more platforms
|
||||
# and we can use match/case
|
||||
|
@ -1799,8 +1881,6 @@ def parse_miff(data):
|
|||
results.extend(parse_miff(rest[lenpal + lenimgdata :]))
|
||||
return results
|
||||
# fmt: on
|
||||
|
||||
|
||||
def read_images(
|
||||
rawdata, colorspace, first_frame_only=False, rot=None, include_thumbnails=False
|
||||
):
|
||||
|
@ -1814,7 +1894,51 @@ def read_images(
|
|||
if rawdata[:12] == b"\x00\x00\x00\x0C\x6A\x50\x20\x20\x0D\x0A\x87\x0A":
|
||||
# image is jpeg2000
|
||||
imgformat = ImageFormat.JPEG2000
|
||||
if rawdata[:14].lower() == b"id=imagemagick":
|
||||
elif rawdata[:8] == b"\x97\x4a\x42\x32\x0d\x0a\x1a\x0a":
|
||||
# For now we only support single-page generic coding of JBIG2, for example as generated by
|
||||
# https://github.com/agl/jbig2enc
|
||||
#
|
||||
# In fact, you can pipe an example image `like src/tests/input/mono.png` directly into img2pdf:
|
||||
# jbig2 src/tests/input/mono.png | img2pdf -o src/tests/output/mono.png.pdf
|
||||
#
|
||||
# For this we assume that the first 13 bytes are the JBIG file header describing a document with one page,
|
||||
# followed by a "page information" segment describing the dimensions of that page.
|
||||
#
|
||||
# The following annotated `hexdump -C 042.jb2` shows the first 40 bytes that we inspect directly.
|
||||
# The first 24 bytes (until "||") have to match exactly, while the following 16 bytes are read by get_imgmetadata.
|
||||
#
|
||||
# 97 4a 42 32 0d 0a 1a 0a 01 00 00 00 01 00 00 00
|
||||
# \_____________________/ | \_________/ \______
|
||||
# magic-bytes org/unk pages seg-num
|
||||
#
|
||||
# 00 30 00 01 00 00 00 13 || 00 00 00 73 00 00 00 30
|
||||
# _/ | | | \_________/ || \_________/ \_________/
|
||||
# type refs page seg-size || width-px height-px
|
||||
#
|
||||
# 00 00 00 48 00 00 00 48
|
||||
# \_________/ \_________/
|
||||
# xres yres
|
||||
#
|
||||
# For more information on the data format, see:
|
||||
# * https://github.com/agl/jbig2enc/blob/ea05019/fcd14492.pdf
|
||||
# For more information about the generic coding, see:
|
||||
# * https://github.com/agl/jbig2enc/blob/ea05019/src/jbig2enc.cc#L898
|
||||
imgformat = ImageFormat.JBIG2
|
||||
if (
|
||||
rawdata[:24]
|
||||
!= b"\x97\x4a\x42\x32\x0d\x0a\x1a\x0a\x01\x00\x00\x00\x01\x00\x00\x00\x00\x30\x00\x01\x00\x00\x00\x13"
|
||||
):
|
||||
raise ImageOpenError(
|
||||
"Unsupported JBIG2 format; only single-page generic coding is supported (e.g. from `jbig2enc`)."
|
||||
)
|
||||
if (
|
||||
rawdata[-22:]
|
||||
!= b"\x00\x00\x00\x021\x00\x01\x00\x00\x00\x00\x00\x00\x00\x033\x00\x01\x00\x00\x00\x00"
|
||||
):
|
||||
raise ImageOpenError(
|
||||
"Unsupported JBIG2 format; we expect end-of-page and end-of-file segments at the end (e.g. from `jbig2enc`)."
|
||||
)
|
||||
elif rawdata[:14].lower() == b"id=imagemagick":
|
||||
# image is in MIFF format
|
||||
# this is useful for 16 bit CMYK because PNG cannot do CMYK and thus
|
||||
# we need PIL but PIL cannot do 16 bit
|
||||
|
@ -1826,12 +1950,7 @@ def read_images(
|
|||
)
|
||||
else:
|
||||
logger.debug("PIL format = %s", imgdata.format)
|
||||
imgformat = None
|
||||
for f in ImageFormat:
|
||||
if f.name == imgdata.format:
|
||||
imgformat = f
|
||||
if imgformat is None:
|
||||
imgformat = ImageFormat.other
|
||||
imgformat = getattr(ImageFormat, imgdata.format, ImageFormat.other)
|
||||
|
||||
def cleanup():
|
||||
if imgdata is not None:
|
||||
|
@ -2060,6 +2179,28 @@ def read_images(
|
|||
)
|
||||
]
|
||||
|
||||
if imgformat == ImageFormat.JBIG2:
|
||||
color, ndpi, imgwidthpx, imgheightpx, rotation, iccp = get_imgmetadata(
|
||||
imgdata, imgformat, default_dpi, colorspace, rawdata, rot
|
||||
)
|
||||
streamdata = rawdata[13:-22] # Strip file header and footer
|
||||
return [
|
||||
(
|
||||
color,
|
||||
ndpi,
|
||||
imgformat,
|
||||
streamdata,
|
||||
None,
|
||||
imgwidthpx,
|
||||
imgheightpx,
|
||||
[],
|
||||
False,
|
||||
1,
|
||||
rotation,
|
||||
iccp,
|
||||
)
|
||||
]
|
||||
|
||||
if imgformat == ImageFormat.MIFF:
|
||||
return parse_miff(rawdata)
|
||||
|
||||
|
@ -2599,14 +2740,11 @@ def find_scale(pagewidth, pageheight):
|
|||
return 10 ** ceil(log10(oversized))
|
||||
|
||||
|
||||
# given one or more input image, depending on outputstream, either return a
|
||||
# string containing the whole PDF if outputstream is None or write the PDF
|
||||
# data to the given file-like object and return None
|
||||
#
|
||||
# Input images can be given as file like objects (they must implement read()),
|
||||
# as a binary string representing the image content or as filenames to the
|
||||
# images.
|
||||
def convert(*images, **kwargs):
|
||||
# Convert the image(s) to a `pdfdoc` object.
|
||||
# The `.writer` attribute holds the underlying engine document handle, and
|
||||
# `.output_version` the minimum version the caller should use when saving.
|
||||
# The main convert() wraps this implementation function.
|
||||
def convert_to_docobject(*images, **kwargs):
|
||||
_default_kwargs = dict(
|
||||
engine=None,
|
||||
title=None,
|
||||
|
@ -2627,7 +2765,6 @@ def convert(*images, **kwargs):
|
|||
viewer_fit_window=False,
|
||||
viewer_center_window=False,
|
||||
viewer_fullscreen=False,
|
||||
outputstream=None,
|
||||
first_frame_only=False,
|
||||
allow_oversized=True,
|
||||
cropborder=None,
|
||||
|
@ -2790,10 +2927,22 @@ def convert(*images, **kwargs):
|
|||
iccp,
|
||||
)
|
||||
|
||||
if kwargs["outputstream"]:
|
||||
pdf.tostream(kwargs["outputstream"])
|
||||
return
|
||||
pdf.finalize()
|
||||
return pdf
|
||||
|
||||
|
||||
# given one or more input image, depending on outputstream, either return a
|
||||
# string containing the whole PDF if outputstream is None or write the PDF
|
||||
# data to the given file-like object and return None
|
||||
#
|
||||
# Input images can be given as file like objects (they must implement read()),
|
||||
# as a binary string representing the image content or as filenames to the
|
||||
# images.
|
||||
def convert(*images, outputstream=None, **kwargs):
|
||||
pdf = convert_to_docobject(*images, **kwargs)
|
||||
if outputstream:
|
||||
pdf.tostream(outputstream)
|
||||
return
|
||||
return pdf.tostring()
|
||||
|
||||
|
||||
|
@ -3316,10 +3465,10 @@ def gui():
|
|||
f.seek(0)
|
||||
doc = fitz.open(stream=f, filetype="pdf")
|
||||
for page in doc:
|
||||
if page.getDisplayList().rect.width > maxpagewidth:
|
||||
maxpagewidth = page.getDisplayList().rect.width
|
||||
if page.getDisplayList().rect.height > maxpageheight:
|
||||
maxpageheight = page.getDisplayList().rect.height
|
||||
if page.get_displaylist().rect.width > maxpagewidth:
|
||||
maxpagewidth = page.get_displaylist().rect.width
|
||||
if page.get_displaylist().rect.height > maxpageheight:
|
||||
maxpageheight = page.get_displaylist().rect.height
|
||||
draw()
|
||||
|
||||
def save_pdf(stream):
|
||||
|
@ -3471,9 +3620,9 @@ def gui():
|
|||
mat_0 = fitz.Matrix(zoom, zoom)
|
||||
canvas.image = tkinter.PhotoImage(
|
||||
data=doc[pagenum]
|
||||
.getDisplayList()
|
||||
.getPixmap(matrix=mat_0, alpha=False)
|
||||
.getImageData("ppm")
|
||||
.get_displaylist()
|
||||
.get_pixmap(matrix=mat_0, alpha=False)
|
||||
.tobytes("ppm")
|
||||
)
|
||||
canvas.create_image(
|
||||
(canvas.size[0] - maxpagewidth * zoom) / 2,
|
||||
|
@ -3820,14 +3969,31 @@ def gui():
|
|||
app.mainloop()
|
||||
|
||||
|
||||
def file_is_icc(fname):
|
||||
with open(fname, "rb") as f:
|
||||
data = f.read(40)
|
||||
if len(data) < 40:
|
||||
return False
|
||||
return data[36:] == b"acsp"
|
||||
|
||||
|
||||
def validate_icc(fname):
|
||||
if not file_is_icc(fname):
|
||||
raise argparse.ArgumentTypeError('"%s" is not an ICC profile' % fname)
|
||||
return fname
|
||||
|
||||
|
||||
def get_default_icc_profile():
|
||||
for profile in [
|
||||
"/usr/share/color/icc/sRGB.icc",
|
||||
"/usr/share/color/icc/OpenICC/sRGB.icc",
|
||||
"/usr/share/color/icc/colord/sRGB.icc",
|
||||
]:
|
||||
if os.path.exists(profile):
|
||||
return profile
|
||||
if not os.path.exists(profile):
|
||||
continue
|
||||
if not file_is_icc(profile):
|
||||
continue
|
||||
return profile
|
||||
return "/usr/share/color/icc/sRGB.icc"
|
||||
|
||||
|
||||
|
@ -3936,6 +4102,10 @@ Examples:
|
|||
|
||||
$ img2pdf --output out.pdf page1.jpg page2.jpg
|
||||
|
||||
Use a custom dpi value for the input images:
|
||||
|
||||
$ img2pdf --output out.pdf --imgsize 300dpi page1.jpg page2.jpg
|
||||
|
||||
Convert a directory of JPEG images into a PDF with printable A4 pages in
|
||||
landscape mode. On each page, the photo takes the maximum amount of space
|
||||
while preserving its aspect ratio and a print border of 2 cm on the top and
|
||||
|
@ -4098,17 +4268,29 @@ RGB.""",
|
|||
% Image.MAX_IMAGE_PIXELS,
|
||||
)
|
||||
|
||||
outargs.add_argument(
|
||||
"--pdfa",
|
||||
nargs="?",
|
||||
const=get_default_icc_profile(),
|
||||
default=None,
|
||||
help="Output a PDF/A-1b compliant document. By default, this will "
|
||||
"embed either /usr/share/color/icc/sRGB.icc, "
|
||||
"/usr/share/color/icc/OpenICC/sRGB.icc or "
|
||||
"/usr/share/color/icc/colord/sRGB.icc as the color profile, whichever "
|
||||
"is found to exist first.",
|
||||
)
|
||||
if sys.platform == "win32":
|
||||
# on Windows, there are no default paths to search for an ICC profile
|
||||
# so make the argument required instead of optional
|
||||
outargs.add_argument(
|
||||
"--pdfa",
|
||||
type=validate_icc,
|
||||
help="Output a PDF/A-1b compliant document. The argument to this "
|
||||
"option is the path to the ICC profile that will be embedded into "
|
||||
"the resulting PDF.",
|
||||
)
|
||||
else:
|
||||
outargs.add_argument(
|
||||
"--pdfa",
|
||||
nargs="?",
|
||||
const=get_default_icc_profile(),
|
||||
default=None,
|
||||
type=validate_icc,
|
||||
help="Output a PDF/A-1b compliant document. By default, this will "
|
||||
"embed either /usr/share/color/icc/sRGB.icc, "
|
||||
"/usr/share/color/icc/OpenICC/sRGB.icc or "
|
||||
"/usr/share/color/icc/colord/sRGB.icc as the color profile, whichever "
|
||||
"is found to exist first.",
|
||||
)
|
||||
|
||||
sizeargs = parser.add_argument_group(
|
||||
title="Image and page size and layout arguments",
|
||||
|
|
File diff suppressed because it is too large
Load diff
BIN
src/tests/input/mono.jb2
Normal file
BIN
src/tests/input/mono.jb2
Normal file
Binary file not shown.
BIN
src/tests/output/mono.jb2.pdf
Normal file
BIN
src/tests/output/mono.jb2.pdf
Normal file
Binary file not shown.
Loading…
Reference in a new issue