Compare commits

..

89 commits

Author SHA1 Message Date
819b366bf5
release version 0.5.1 2023-11-26 06:33:10 +01:00
cc8c708295
HACKING: how to bisect 2023-11-25 09:47:53 +01:00
fb9537d8b7
src/img2pdf.py: allow PNG input without dpi units but non-square dpi aspect ratio
Closes: #181
2023-11-25 09:47:52 +01:00
7678435eb7
validate icc profile and no default location on windows
closes: #179
2023-11-07 18:50:07 +01:00
ba7a360866
release version 0.5.0 2023-10-28 08:35:54 +02:00
7f0bf47ff3
src/img2pdf.py: reformat with black 2023-10-28 08:35:53 +02:00
Leo
5cd0918d50 Issue #175 related. The original was SmartAlbums, but another case with 'Adobe PS', so delete the exif_software check part 2023-10-18 13:33:44 +08:00
Leo
f157ced05d
ignore RGB icc profile for grayscale jpegs produced by SmartAlbums
closes: #175
2023-10-17 11:32:25 +02:00
09064e8e70
jp2: rudimentary support for raw jpeg2000 without jp2 boxes 2023-08-08 07:40:38 +02:00
2f736d7891
allow 'matte' to be missing in MIFF 2023-08-06 19:43:19 +02:00
e05580a49a
src/img2pdf_test.py: IM7 dropped 'baseType' in json output, so use 'type' instead which works for both IM6 and IM7 2023-08-06 19:27:01 +02:00
acc25a4926
Support JPEG2000 images with transparency
Closes: #173
2023-08-05 16:06:30 +02:00
f597887088
The GIMP ICC bug does not only apply to 1-bit tiff but also to black/white palette PNG
https://gitlab.gnome.org/GNOME/gimp/-/issues/3438

Closes: #159
2023-08-05 14:43:18 +02:00
3e832fbcc2
add information about how to convert images to 8 bit (closes: #170) 2023-08-05 14:43:07 +02:00
1e8557cef1
src/img2pdf_test.py: drop check for endianness for tests where it does not matter
IM7 defaults to big-endian on architectures other than x86 even if they
are little endian: https://github.com/ImageMagick/ImageMagick/issues/6300

Closes: #152
2023-08-05 14:42:48 +02:00
29921eeabd
the default PDF/A icc profile is /usr/share/color/icc/sRGB.icc, /usr/share/color/icc/OpenICC/sRGB.icc or /usr/share/color/icc/colord/sRGB.icc depending on which one exists 2023-06-11 21:56:21 +02:00
33139612f8
src/img2pdf_test.py: make endianness dependant on sys.byteorder (closes: #152) 2023-06-11 14:45:09 +02:00
64d27f4a8b
src/img2pdf_test.py: allow Bilevel as well as Grayscale type for png_gray1_img (closes: #161) 2023-06-11 13:24:30 +02:00
85cbe1d128
factor out argparse.ArgumentParser to allow for generating completions via shtab 2023-06-11 08:09:46 +02:00
b25429a4c1
src/img2pdf_test.py: add tests for timestamps 2023-06-11 08:01:36 +02:00
c703e9df06
fix date(1) based timestamp parser 2023-06-11 07:48:23 +02:00
79e9985f35
src/img2pdf_test.py: black 2023-06-11 07:47:22 +02:00
cb2644c34f
do not include thumbnails in the output by default unless --include-thumbnails is used
This is relevant for the MPO format which otherwise would result in PDF
files containing the same image in different sizes multiple times. With
this change, the default is to only have a single page containing the
full MPO. This means that extracting that MPO also gets the thumbnails
back.

With the --include-thumbnails option, each frame gets stored on its own
page as it is done for multi-frame GIF, for example.

Closes: #135
2023-06-11 07:31:07 +02:00
81502f21af Convert creation/modification dates to UTC (fixes #155)
Ensure that timezones are correctly interpreted in the input by calling
`.astimezone()` as appropriate on datetime objects, and store the
resulting date fields as UTC.

One could argue that datetimes in the local timezone be stored in the
PDF, but then the date string handling becomes more complicated; the PDF
and XMP date specs both use the `Z` suffix to indicate UTC time, but
other +/- offsets require different syntax between the two specs.
2023-06-10 17:53:03 -07:00
0cbcb8fa12
avoid converting palette PNG with alpha to RGB (closes: #158) 2023-06-08 08:54:37 +02:00
e9e04b6dd9
extend comments around dropping ICC profile stored by GIMP for bilevel input 2023-06-08 08:53:22 +02:00
fc059ee471
use quotes around caret in examples for windows users
Closes: #167
2023-06-08 07:14:17 +02:00
25466113e9
another small fixup for the last commit 2023-05-30 08:06:36 +02:00
7405635b72
only check whether icc profile can be dropped if there is any 2023-05-30 07:10:32 +02:00
aea472101b
strip off RGB color profile from bilevel TIFF images produced by gimp (closes: #164) 2023-05-30 06:25:26 +02:00
7fa67bb337
demote print() to logger.debug() 2023-05-29 09:25:21 +02:00
7d40569aa1
Inform the user what is happening when running without any arguments and suggest using --help to get the help text (closes: #156) 2023-05-28 15:25:28 +02:00
83f9c32328
appveyor.yml: try out --console --nowindowed 2023-05-28 15:25:28 +02:00
be8369373f
pass deterministic_id=True to writer.save() for pikepdf >= 6.2.0
Closes: #150
2022-10-16 14:13:35 +02:00
10c6901fa3
src/img2pdf_test.py: do not test the depth attribute and rely on baseDepth
closes: #119
2022-09-23 23:10:53 +02:00
57d7e07e6b
Support imagemagick 7.1.0-48
- the output of -metric PSNR changed
 - CMYK output can now be exactly compared

closes: #148
2022-09-15 04:36:16 +02:00
272fe0433f allow pathlib.Path objects by allowing objects implementing read_bytes function 2022-07-02 21:19:34 +02:00
ef7b9e739d add miff tests for cmyk8 and rgb8 2022-07-02 20:39:18 +02:00
af6fe27d53 avoid match/case for now until python 3.10 is available on more platforms 2022-06-28 14:22:14 +01:00
bad6fcae39 support for MIFF which allows 16 bit CMYK images
closes: #144
2022-06-27 13:22:07 +01:00
d9b90499f3
README.md: compare to econvert (closes: #143) 2022-05-18 13:08:05 +02:00
edb0d29a14
README.md: fix link 2022-05-13 21:27:12 +02:00
bb3e8b0098
README.md: document that img2pdf.exe can now be downloaded via release 2022-05-13 21:25:37 +02:00
f454ebc6a6
release version 0.4.4 2022-04-07 22:40:36 +02:00
c3db273e23
Remove outdated readme entry concerning JP2 colorspace
If I understood the code in `jp2.py` correctly, this should now work.
Moreover, Pillow should usually be able to open JP2 files, so `jp2.py` is only a fallback.
2022-04-07 22:08:41 +02:00
87afabd3cf
add .mailmap 2022-04-07 22:08:18 +02:00
homocomputeris
5045282cc2
Add B and JB paper sizes 2022-04-07 22:02:16 +02:00
fb4b96452a
reformat with black 2022-04-07 21:58:34 +02:00
c553e169a4
use TiffImagePlugin.STRIP_SIZE for Pillow >= 8.4.0 2022-04-07 21:58:23 +02:00
d9345ac767
set GifImagePlugin.LOADING_STRATEGY to RGB_AFTER_DIFFERENT_PALETTE_ONLY in Pillow >= 9.1.0 (closes: #130) 2022-04-07 21:57:37 +02:00
1d52530229
support new pikepdf.Page object when making indirect objects (closes: #132) 2022-04-07 21:55:45 +02:00
3b117e674b
clarify that input with an alpha channel requires additional computation (closes: #142) 2022-04-07 17:13:16 +02:00
e8ca53738f
--viewer-page-layout support for twopageright and twopageleft 2022-03-25 14:01:18 +01:00
7c48bfb868
find_scale(): better document why we use powers of 10 2022-03-24 12:43:49 +01:00
244f034a2e
src/img2pdf_test.py: format with black 2022-03-22 17:57:12 +01:00
3da370d3bd
add more information about how to ignore invalid rotation values in input images 2022-02-06 20:33:30 +01:00
6cff2931e4
ensure that gif test input uses the same palette for all frames 2022-02-06 20:25:25 +01:00
6a55258804
appveyor.yml: rename pil to Pillow 2022-01-16 23:38:22 +01:00
3cdeab08ab
appveyor.yml: also install pil so that maybe pyinstaller picks it up 2022-01-16 23:36:37 +01:00
cea7c9120b
tox.ini: python 3.5 and 3.6 are not supported anymore 2022-01-16 22:18:22 +01:00
9eacfdaa76
appveyor.yml: don't run tests because we don't have imagemagick 2021-12-11 11:22:59 +01:00
95a313f437
tox.ini: add python 3.10 to envlist 2021-12-11 11:22:31 +01:00
30d705f020
src/img2pdf.py: the /OpenAction array must contain the page as an indirect object 2021-11-28 16:55:42 +01:00
dc926b2cf2
release version 0.4.3 2021-10-24 16:51:02 +02:00
a8fdbd0038
fix --viewer-initial-page (broken in last release) 2021-10-24 16:15:46 +02:00
6ff175d637
update my name 2021-10-13 09:17:44 +02:00
0732dff0be
src/img2pdf_test.py: make img2pdf path configurable 2021-10-13 09:17:17 +02:00
50b7145f64
release version 0.4.2 2021-10-11 17:16:03 +02:00
e522ec14d9
remove pdfrw from tests (closes: #108) 2021-10-09 15:29:44 +02:00
9c9e5ece19
src/img2pdf_test.py: add support for imagemagick 7 2021-10-03 22:54:33 +02:00
354fd7c264
src/img2pdf.py: format with black 2021-10-03 21:03:22 +02:00
392d4a665e
adapt AlphaChannelError messages 2021-10-03 21:03:08 +02:00
09ad147d97
support PNG palette images with icc profile (closes: #107) 2021-10-03 20:59:59 +02:00
80393b6efa
also support palette+alpha 2021-10-03 20:56:53 +02:00
e265738ac2
Revert "allow palette PNG images with ICC profile (closes: #107)"
This reverts commit 3d7e0e6812.
2021-10-03 16:09:54 +02:00
1ffb160453
Readme: remove alpha channels section 2021-10-03 16:04:53 +02:00
cde7472d15 Update documentation: img2pdf now supports alpha channels 2021-10-03 11:48:24 +00:00
6eec05c11c
workaround for Pillow >= 8.3.0 which limits CCITT Group4 strip size to 64 KB (closes: #122) 2021-10-03 13:24:53 +02:00
Tamás Zahola
f483638b17 Test cases for transparency 2021-08-22 08:08:22 +00:00
Tamás Zahola
7f216a8848 Test support on macOS 2021-08-22 08:08:21 +00:00
Tamás Zahola
2476215f39 Always initialize smask 2021-08-22 08:08:20 +00:00
Tamás Zahola
f62858c245 Formatting 2021-08-22 08:08:20 +00:00
Tamás Zahola
a5e4da5755 Use PNG predictor for /SMask too 2021-08-22 08:08:19 +00:00
Tamás Zahola
64db7909ec Added transparency support for GIFs, palette-based PNGs and grayscale PNGs 2021-08-22 08:08:19 +00:00
Tamás Zahola
af5ae5b9b6 Convert 8-bit PNG alpha channels to /SMasks in PDF 2021-08-22 08:08:19 +00:00
d03f331521
Remove ineffective description-file parameter from setup.cfg
Otherwise, we get a warning:
"Usage of dash-separated 'description-file' will not be supported in
future versions."

This parameter was only used when setup.py was used with pbr=True, which is
not the case here, and setup.py loads the right file into long_description.
2021-08-22 03:15:20 +02:00
635b08c321
README: fix CI URLs 2021-08-21 20:57:15 +02:00
152f6fb629
tests: look for sRGB.icc in several paths, skip related tests if not found
Always use Ghostscript's sRGB.icc.
2021-08-19 19:56:01 +02:00
1f3b456ac9
tests: don't expect ImageMagick to compress TIFF files by default
This is no longer the case with ImageMagick 7.
2021-08-19 13:17:06 +02:00
14 changed files with 1725 additions and 663 deletions

3
.mailmap Normal file
View file

@ -0,0 +1,3 @@
Johannes Schauer Marin Rodrigues <josch@mister-muffin.de>
Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> <j.schauer@email.de>
Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> <josch@pyneo.org>

View file

@ -13,7 +13,6 @@ matrix:
- netpbm - netpbm
- ghostscript - ghostscript
- mupdf-tools - mupdf-tools
- icc-profiles-free
- name: "python 3.9 Windows" - name: "python 3.9 Windows"
os: windows os: windows
language: shell # 'language: python' is an error on Travis CI Windows language: shell # 'language: python' is an error on Travis CI Windows

View file

@ -2,7 +2,44 @@
CHANGES CHANGES
======= =======
0.4.1 (2020-05-09) 0.5.1 (2023-11-26)
------------------
- no default ICC profile location for PDF/A-1b on Windows
- workaround for PNG input without dpi units but non-square dpi aspect ratio
0.5.0 (2023-10-28)
------------------
- support MIFF for 16 bit CMYK input
- accept pathlib.Path objects as input
- don't store RGB ICC profiles from bilevel or grayscale TIFF, PNG and JPEG
- thumbnails are no longer included by default and --include-thumbnails has to
be used if you want them
- support for pikepdf (>= 6.2.0)
0.4.4 (2022-04-07)
------------------
- --viewer-page-layout support for twopageright and twopageleft
- Add B and JB paper sizes
- support for pikepdf (>= 5.0.0) and Pillow (>= 9.1.0)
0.4.3 (2021-10-24)
------------------
- fix --viewer-initial-page (broken in last release)
0.4.2 (2021-10-11)
------------------
- add --rotation
- allow palette PNG images with ICC profile
- sort globbing result on windows
- convert 8-bit PNG alpha channels to /SMasks in PDF
- remove pdfrw from tests
0.4.1 (2021-05-09)
------------------ ------------------
- support wildcards in paths on windows - support wildcards in paths on windows

39
HACKING
View file

@ -27,6 +27,41 @@ Making a new release
- Build and upload to pypi: - Build and upload to pypi:
$ rm dist/* $ rm -rf dist/*
$ python3 setup.py sdist $ python3 setup.py sdist
$ twine upload --sign dist/* $ twine upload dist/*
Using debbisect to find regressions
-----------------------------------
$ debbisect --cache=./cache --depends="git,ca-certificates,python3,
ghostscript,imagemagick,mupdf-tools,poppler-utils,python3-pil,
python3-pytest,python3-numpy,python3-scipy,python3-pikepdf" \
--verbose 2023-09-16 2023-10-24 \
'chroot "$1" sh -c "
git clone https://gitlab.mister-muffin.de/josch/img2pdf.git
&& cd img2pdf
&& pytest 'src/img2pdf_test.py::test_jpg_2000_rgba8[internal]"'
Using debbisect cache
---------------------
$ mmdebstrap --variant=apt --aptopt='Acquire::Check-Valid-Until "false"' \
--include=git,ca-certificates,python3,ghostscript,imagemagick \
--include=mupdf-tools,poppler-utils,python3-pil,python3-pytest \
--include=python3-numpy,python3-scipy,python3-pikepdf \
--hook-dir=/usr/share/mmdebstrap/hooks/file-mirror-automount \
--setup-hook='mkdir -p "$1/home/josch/git/devscripts/cache/pool/"' \
--setup-hook='mount -o ro,bind /home/josch/git/devscripts/cache/pool/ "$1/home/josch/git/devscripts/cache/pool/"' \
--chrooted-customize-hook=bash
unstable /dev/null
file:///home/josch/git/devscripts/cache/archive/debian/20231022T090139Z/
Bisecting imagemagick
---------------------
$ git clean -fdx && git reset --hard
$ ./configure --prefix=$(pwd)/prefix
$ make -j$(nproc)
$ make install
$ LD_LIBRARY_PATH=$(pwd)/prefix/lib prefix/bin/compare ...

View file

@ -1,5 +1,5 @@
[![Travis Status](https://travis-ci.org/josch/img2pdf.svg?branch=master)](https://travis-ci.org/josch/img2pdf) [![Travis Status](https://travis-ci.com/josch/img2pdf.svg?branch=main)](https://app.travis-ci.com/josch/img2pdf)
[![Appveyor Status](https://ci.appveyor.com/api/projects/status/2kws3wkqvi526llj/branch/master?svg=true)](https://ci.appveyor.com/project/josch/img2pdf/branch/master) [![Appveyor Status](https://ci.appveyor.com/api/projects/status/2kws3wkqvi526llj/branch/main?svg=true)](https://ci.appveyor.com/project/josch/img2pdf/branch/main)
img2pdf img2pdf
======= =======
@ -27,15 +27,15 @@ software, because the raw pixel data never has to be loaded into memory.
The following table shows how img2pdf handles different input depending on the The following table shows how img2pdf handles different input depending on the
input file format and image color space. input file format and image color space.
| Format | Colorspace | Result | | Format | Colorspace | Result |
| -------------------- | ------------------------------ | ------------- | | ------------------------------------- | ------------------------------ | ------------- |
| JPEG | any | direct | | JPEG | any | direct |
| JPEG2000 | any | direct | | JPEG2000 | any | direct |
| PNG (non-interlaced) | any | direct | | PNG (non-interlaced, no transparency) | any | direct |
| TIFF (CCITT Group 4) | monochrome | direct | | TIFF (CCITT Group 4) | monochrome | direct |
| any | any except CMYK and monochrome | PNG Paeth | | any | any except CMYK and monochrome | PNG Paeth |
| any | monochrome | CCITT Group 4 | | any | monochrome | CCITT Group 4 |
| any | CMYK | flate | | any | CMYK | flate |
For JPEG, JPEG2000, non-interlaced PNG and TIFF images with CCITT Group 4 For JPEG, JPEG2000, non-interlaced PNG and TIFF images with CCITT Group 4
encoded data, img2pdf directly embeds the image data into the PDF without encoded data, img2pdf directly embeds the image data into the PDF without
@ -72,25 +72,15 @@ Bugs
when embedded into the PDF cannot be read by the Adobe Acrobat Reader, when embedded into the PDF cannot be read by the Adobe Acrobat Reader,
please contact me. please contact me.
- I have not yet figured out how to determine the colorspace of JPEG2000
files. Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000
files with other colorspaces, you must explicitly specify it using the
`--colorspace` option.
- Input images with alpha channels are not allowed. PDF only supports
transparency using binary masks but is unable to store 8-bit transparency
information as part of the image itself. But img2pdf will always be lossless
and thus, input images must not carry transparency information. You can
remove the alpha channel for example with imagemagick:
convert input.png -background white -alpha remove -alpha off output.png
- An error is produced if the input image is broken. This commonly happens if - An error is produced if the input image is broken. This commonly happens if
the input image has an invalid EXIF Orientation value of zero. Even though the input image has an invalid EXIF Orientation value of zero. Even though
only nine different values from 1 to 9 are permitted, Anroid phones and only nine different values from 1 to 9 are permitted, Anroid phones and
Canon DSLR cameras produce JPEG images with the invalid value of zero. Canon DSLR cameras produce JPEG images with the invalid value of zero.
Either fix your input images with `exiftool` or similar software before Either fix your input images with `exiftool` or similar software before
passing the JPEG to `img2pdf` or run `img2pdf` with `--rotation=ifvalid`. passing the JPEG to `img2pdf` or run `img2pdf` with `--rotation=ifvalid`
(if you run img2pdf from the commandline) or by passing
`rotation=img2pdf.Rotation.ifvalid` as an argument to `convert()` when using
img2pdf as a library.
- img2pdf uses PIL (or Pillow) to obtain image meta data and to convert the - img2pdf uses PIL (or Pillow) to obtain image meta data and to convert the
input if necessary. To prevent decompression bomb denial of service attacks, input if necessary. To prevent decompression bomb denial of service attacks,
@ -127,10 +117,9 @@ You can then test the converter using:
$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg $ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
For Microsoft Windows users, PyInstaller based .exe files are produced by If you don't want to setup Python on Windows, then head to the
appveyor. If you don't want to install Python before using img2pdf you can head [releases](/josch/img2pdf/releases) section and download the latest
to appveyor and click on "Artifacts" to download the latest version: `img2pdf.exe`.
https://ci.appveyor.com/project/josch/img2pdf
GUI GUI
--- ---
@ -157,6 +146,10 @@ The package can also be used as a library:
with open("name.pdf","wb") as f1, open("test.jpg") as f2: with open("name.pdf","wb") as f1, open("test.jpg") as f2:
f1.write(img2pdf.convert(f2)) f1.write(img2pdf.convert(f2))
# opening using pathlib
with open("name.pdf","wb") as f:
f.write(img2pdf.convert(pathlib.Path('test.jpg')))
# using in-memory image data # using in-memory image data
with open("name.pdf","wb") as f: with open("name.pdf","wb") as f:
f.write(img2pdf.convert("\x89PNG...") f.write(img2pdf.convert("\x89PNG...")
@ -199,6 +192,15 @@ The package can also be used as a library:
with open("name.pdf","wb") as f: with open("name.pdf","wb") as f:
f.write(img2pdf.convert(glob.glob("/path/to/*.jpg"))) f.write(img2pdf.convert(glob.glob("/path/to/*.jpg")))
# convert all files matching a glob using pathlib.Path
from pathlib import Path
with open("name.pdf","wb") as f:
f.write(img2pdf.convert(*Path("/path").glob("**/*.jpg")))
# ignore invalid rotation values in the input images
with open("name.pdf","wb") as f:
f.write(img2pdf.convert('test.jpg'), rotation=img2pdf.Rotation.ifvalid)
# writing to file descriptor # writing to file descriptor
with open("name.pdf","wb") as f1, open("test.jpg") as f2: with open("name.pdf","wb") as f1, open("test.jpg") as f2:
img2pdf.convert(f2, outputstream=f1) img2pdf.convert(f2, outputstream=f1)
@ -306,3 +308,14 @@ Tesseract might not do a lossless conversion. For example it converts CMYK
input to RGB and removes the alpha channel from images with transparency. For input to RGB and removes the alpha channel from images with transparency. For
multipage TIFF or animated GIF, it will only convert the first frame. multipage TIFF or animated GIF, it will only convert the first frame.
Comparison to econvert from ExactImage
--------------------------------------
Like pdflatex and podofoimg2pf, econvert is able to embed JPEG images into PDF
directly without re-encoding but when given other file formats, it stores them
just using flate compressen, which unnecessarily increases the filesize.
Furthermore, it throws an error with CMYK TIF input. It also doesn't store CMYK
jpeg files as CMYK but converts them to RGB, so it's not lossless. When trying
to feed it 16bit files, it errors out with Unhandled bps/spp combination. It
also seems to choose JPEG encoding when using it on some file types (like
palette images) making it again not lossless for that input as well.

View file

@ -16,16 +16,18 @@ environment:
- PYTHON: "C:\\Python37-x64" - PYTHON: "C:\\Python37-x64"
install: install:
- "%PYTHON%\\python.exe -m pip install tox wheel pyinstaller" - "%PYTHON%\\python.exe -m pip install tox wheel pyinstaller Pillow"
build: off build: off
test_script: # don't run tests on windows because we don't have imagemagick
- "%PYTHON%\\python.exe -m tox" #test_script:
# - "%PYTHON%\\python.exe -m tox"
after_test: after_test:
- "%PYTHON%\\python.exe setup.py bdist_wheel" - "%PYTHON%\\python.exe setup.py bdist_wheel"
- "%PYTHON%\\python.exe -m PyInstaller --clean --onefile --noconsole src/img2pdf.py" - "%PYTHON%\\python.exe -m PyInstaller --clean --onefile --console --nowindowed --name img2pdf src/img2pdf.py"
#- "%PYTHON%\\python.exe -m PyInstaller --clean --onefile --noconsole --windowed --name img2pdf_windowed src/img2pdf.py"
artifacts: artifacts:
- path: dist\* - path: dist\*

View file

@ -1,2 +0,0 @@
[metadata]
description-file = README.md

View file

@ -1,7 +1,7 @@
import sys import sys
from setuptools import setup from setuptools import setup
VERSION = "0.4.1" VERSION = "0.5.1"
INSTALL_REQUIRES = ( INSTALL_REQUIRES = (
"Pillow", "Pillow",
@ -11,7 +11,7 @@ INSTALL_REQUIRES = (
setup( setup(
name="img2pdf", name="img2pdf",
version=VERSION, version=VERSION,
author="Johannes 'josch' Schauer", author="Johannes Schauer Marin Rodrigues",
author_email="josch@mister-muffin.de", author_email="josch@mister-muffin.de",
description="Convert images to PDF via direct JPEG inclusion.", description="Convert images to PDF via direct JPEG inclusion.",
long_description=open("README.md").read(), long_description=open("README.md").read(),

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -1,6 +1,6 @@
#!/usr/bin/env python #!/usr/bin/env python
# #
# Copyright (C) 2013 Johannes 'josch' Schauer <j.schauer at email.de> # Copyright (C) 2013 Johannes Schauer Marin Rodrigues <j.schauer at email.de>
# #
# this module is heavily based upon jpylyzer which is # this module is heavily based upon jpylyzer which is
# KB / National Library of the Netherlands, Open Planets Foundation # KB / National Library of the Netherlands, Open Planets Foundation
@ -37,9 +37,8 @@ def getBox(data, byteStart, noBytes):
def parse_ihdr(data): def parse_ihdr(data):
height = struct.unpack(">I", data[0:4])[0] height, width, channels, bpp = struct.unpack(">IIHB", data[:11])
width = struct.unpack(">I", data[4:8])[0] return width, height, channels, bpp + 1
return width, height
def parse_colr(data): def parse_colr(data):
@ -59,8 +58,8 @@ def parse_colr(data):
def parse_resc(data): def parse_resc(data):
hnum, hden, vnum, vden, hexp, vexp = struct.unpack(">HHHHBB", data) hnum, hden, vnum, vden, hexp, vexp = struct.unpack(">HHHHBB", data)
hdpi = ((hnum / hden) * (10 ** hexp) * 100) / 2.54 hdpi = ((hnum / hden) * (10**hexp) * 100) / 2.54
vdpi = ((vnum / vden) * (10 ** vexp) * 100) / 2.54 vdpi = ((vnum / vden) * (10**vexp) * 100) / 2.54
return hdpi, vdpi return hdpi, vdpi
@ -85,13 +84,13 @@ def parse_jp2h(data):
while byteStart < noBytes and boxLengthValue != 0: while byteStart < noBytes and boxLengthValue != 0:
boxLengthValue, boxType, byteEnd, boxContents = getBox(data, byteStart, noBytes) boxLengthValue, boxType, byteEnd, boxContents = getBox(data, byteStart, noBytes)
if boxType == b"ihdr": if boxType == b"ihdr":
width, height = parse_ihdr(boxContents) width, height, channels, bpp = parse_ihdr(boxContents)
elif boxType == b"colr": elif boxType == b"colr":
colorspace = parse_colr(boxContents) colorspace = parse_colr(boxContents)
elif boxType == b"res ": elif boxType == b"res ":
hdpi, vdpi = parse_res(boxContents) hdpi, vdpi = parse_res(boxContents)
byteStart = byteEnd byteStart = byteEnd
return (width, height, colorspace, hdpi, vdpi) return (width, height, colorspace, hdpi, vdpi, channels, bpp)
def parsejp2(data): def parsejp2(data):
@ -102,7 +101,9 @@ def parsejp2(data):
while byteStart < noBytes and boxLengthValue != 0: while byteStart < noBytes and boxLengthValue != 0:
boxLengthValue, boxType, byteEnd, boxContents = getBox(data, byteStart, noBytes) boxLengthValue, boxType, byteEnd, boxContents = getBox(data, byteStart, noBytes)
if boxType == b"jp2h": if boxType == b"jp2h":
width, height, colorspace, hdpi, vdpi = parse_jp2h(boxContents) width, height, colorspace, hdpi, vdpi, channels, bpp = parse_jp2h(
boxContents
)
break break
byteStart = byteEnd byteStart = byteEnd
if not width: if not width:
@ -112,13 +113,41 @@ def parsejp2(data):
if not colorspace: if not colorspace:
raise Exception("no colorspace in jp2 header") raise Exception("no colorspace in jp2 header")
# retrieving the dpi is optional so we do not error out if not present # retrieving the dpi is optional so we do not error out if not present
return (width, height, colorspace, hdpi, vdpi) return (width, height, colorspace, hdpi, vdpi, channels, bpp)
def parsej2k(data):
lsiz, rsiz, xsiz, ysiz, xosiz, yosiz, _, _, _, _, csiz = struct.unpack(
">HHIIIIIIIIH", data[4:42]
)
ssiz = [None] * csiz
xrsiz = [None] * csiz
yrsiz = [None] * csiz
for i in range(csiz):
ssiz[i], xrsiz[i], yrsiz[i] = struct.unpack(
"BBB", data[42 + 3 * i : 42 + 3 * (i + 1)]
)
assert ssiz == [7, 7, 7]
return xsiz - xosiz, ysiz - yosiz, None, None, None, csiz, 8
def parse(data):
if data[:4] == b"\xff\x4f\xff\x51":
return parsej2k(data)
else:
return parsejp2(data)
if __name__ == "__main__": if __name__ == "__main__":
import sys import sys
width, height, colorspace = parsejp2(open(sys.argv[1]).read()) width, height, colorspace, hdpi, vdpi, channels, bpp = parse(
sys.stdout.write("width = %d" % width) open(sys.argv[1], "rb").read()
sys.stdout.write("height = %d" % height) )
sys.stdout.write("colorspace = %s" % colorspace) print("width = %d" % width)
print("height = %d" % height)
print("colorspace = %s" % colorspace)
print("hdpi = %s" % hdpi)
print("vdpi = %s" % vdpi)
print("channels = %s" % channels)
print("bpp = %s" % bpp)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.9 KiB

After

Width:  |  Height:  |  Size: 1.9 KiB

Binary file not shown.

View file

@ -4,7 +4,7 @@
# and then run "tox" from this directory. # and then run "tox" from this directory.
[tox] [tox]
envlist = py35, py36, py37, py38, py39 envlist = py37, py38, py39, py310
skip_missing_interpreters = true skip_missing_interpreters = true
[testenv] [testenv]