josch/img2pdf

Fork 10

Adding an "imgposition" option #53

New issue

Closed

opened 2021-04-25 19:58:16 +00:00 by josch · 1 comment

josch commented

2021-04-25 19:58:16 +00:00

Owner

By Ben on 2018-12-02T21:47:11.928Z

Scenario:

Printing a letter image (8.5inx11in) on a legal (8.5inx14in) page format to get extra space for handwriting notes.

Actual result:

The image is positioned in the center, splitting the extra space.

Expect option

Position the image using a kind of "--gavity" option "a la" ImageMagick.

Example:

mg2pdf --output out.pdf --pagesize Legal --imgposition North *.jpg

ImageMagick > Image Positioning using Gravity


Here is an example of using gravity to position images on a background.


  composite label:Default                      rings.jpg gravity_default.jpg
  composite label:Center    -gravity center    rings.jpg gravity_center.jpg
  composite label:South     -gravity south     rings.jpg gravity_south.jpg
  composite label:East      -gravity east      rings.jpg gravity_east.jpg
  composite label:NorthEast -gravity northeast rings.jpg gravity_northeast.jpg

So --imgposition should support:

NorthWest
North
NorthEast
West
Center
East
SouthWest
South
SouthEast

http://www.imagemagick.org/Usage/annotating/#gravity

http://www.imagemagick.org/script/command-line-options.php?#gravity

By josch on 2018-12-02T22:50:34.484Z

img2pdf is primarily supposed to be a simple converter from image to pdf. It is not supposed to be a competitor to powerful tools like imagemagick or other image manipulation tools. As it is, img2pdf already contains a number of convenience options that go far beyond the capabilities of a "simple converter". As by the UNIX philosophy each tool should only do one thing and do that one thing well. Thus I recommend that you either edit your image with a tool like ImageMagick in a way that you described in your initial post before sending it to img2pdf or you use another tool like LaTeX to convert a Letter-sized PDF into a Legal-sized PDF with the border that you want it to have.

By Ben on 2018-12-03T01:38:41.235Z

I completely agree with you about the importance of sticking to the UNIX philosophy. img2pdf is fast / light and get the job done perfectly based on my preliminary tests, that IS the reason why I want to use it over other solutions. Also, I think that --imgposition would respect the "rules" governing the UNIX philosophy. The proposition is based on the following logic :

The image is centered by default. That impliy that calculations are done to position the image inside the borders, therefor the function, conceptually at least, already exists. I was so convinced, that I review quickly the source code to check if the option wasn't buried somewhere and not documented. From that angle, the --imgposition option would only allow the user / developer to customize the behavior of a function that already exists (Rule of Least Surprise?).
Using ImageMagick or Latext would add an extra dependency to projects that are using img2pdf and would imply more intensive resources usage. We must consider cloud deployment or mobile applications, where memory and CPU can be limited. (Rule of Parsimony?).
The extra code and commands required to position the image could be needed in multiple projects by multiple developers / users, while --imgposition would only take an extra option each time, saving time (Rule of Economy?).
Not all scanned materials respect "standard" formats. In the education sector for example, PDF are created based on book excerpts, books that come in all formats. The "black padding" resulting from the scanning is usually removed (ink saving) and the ratio is kept. Consequentially, images are often smaller than the page size. Getting the image in the center isn't always optimal for students. (Rule of Diversity?).

In summary, not having this option could create, in some circumstances, unintended and negative "externalities" to other users / developers that would be quite "inexpensive" to fix right away in img2pdf...

By josch on 2018-12-03T06:51:24.082Z

Thanks for your extensive reasoning about this issue.

I think I have a solution that could make both of us happy.

Instead of adding yet another option and thus increasing the complexity of img2pdf command line arguments, the same functionality could also be implemented by improving the --border option. That option so far only allows either one or two values to specify border size. If it would also allow four values, then it could be used to position the image on the page in the way that you requested like so:

img2pdf --output out.pdf --pagesize Legal --border 0:0:3in *.jpg

The order of border values could be the same as used in the CSS border-size property which might be familiar to some:

When one value is specified, it applies the same width to all four sides.
When two values are specified, the first width applies to the top and bottom, the second to the left and right.
When three values are specified, the first width applies to the top, the second to the left and right, the third to the bottom.
When four values are specified, the widths apply to the top, right, bottom, and left in that order (clockwise).

The behaviour with one and two values is already what img2pdf is doing right now. So the example command I gave above would put the image on a Legal sized paper such that there are 3 inches of space at the bottom of the page.

By Ben on 2018-12-03T08:08:10.224Z

If I understand your proposition right, it would work well in the case of images with a known and standardized size for each batch of images.

But what about book excerpts with variable sizes, images that we want to be aligned vertically at the top and horizontally at the center of the page in a single pdf file. Each image position would need to be adjusted using the borders?

I've bring the ImageMagick example because I find elegant the concept of "gravity". You calculate the remaining area after applying the borders, apply the "fit" function to the image and apply the gravity as requested. It allows to automate the positioning of images with random sizes and keep option to a bare minimum. That probably respects the philosophy behind img2pdf.

Now, building on existing option parameters, why not using the circumflex to our advantage...

img2pdf --output out.pdf --pagesize Legal --border 0:0:3in^North *.jpg

or more CSS style:

img2pdf --output out.pdf --pagesize Legal --border 0:0:3in^Top:Center *.jpg

That would generate a page with the image on a Legal sized paper such that there are 3 inches of space at the bottom of the page and the image is top / center.

That would be quite elegant and powerful!

By josch on 2018-12-03T12:06:00.995Z

Would your specific use-case (letter -> legal) be covered by extending the --border option?

Or are you also using img2pdf on scans from books as you describe?

By Ben on 2018-12-03T15:39:37.841Z

Yes and yes.

Book excerpts and receipts (some scanned, faxed or coming from smartphone cameras and cropped by the user), all end up with some random sizes.

Something like --border 0:0:3in^Top:Center would cover all the scenarios the pipelines I'm working on must handle.

Also, your idea of a more powerful "border" option would indeed be handy, for example on papers that could have standardized pre-printed contents, like a letterhead with contents on three sides, that would allow to create "protected area" quite easily and push the image directly where it should go... very nice!

By josch on 2018-12-03T20:11:39.307Z

Hrm... I guess I could be persuaded. I have to sleep over this.

Though I fear I will not have time to implement this myself. And I also lack motivation because I personally have no need for this feature. But pull requests are welcome!

By Ben on 2018-12-04T04:58:34.384Z

Great! That was probably the hardest part ;-)

I'm no Python programmer but I will see what I can do. After a very quick review:

# the image is always centered on the page
imgxpdf = (pagewidth - imgwidthpdf)/2.0
imgypdf = (pageheight - imgheightpdf)/2.0
 
<snip>

text = ("q\n%0.4f 0 0 %0.4f %0.4f %0.4f cm\n/Im0 Do\nQ" %  
(imgwidthpdf, imgheightpdf, imgxpdf, imgypdf)).encode("ascii")

We know / have the following:

Coordinates 0,0 is the top/left corner of the page
Page size
Image size
Border(s) width
Coordinates of the center of the page, border(s) corrected.

Right?

If so, we could apply an offset based on the center coordinates to get a gravity function.

The +/-offset(s) (x, y or x/y) would be the space(s) length between the image edge(s) and inside edge(s) of border(s) in the direction of the "gravity", if greater than 0.

The offset length would be ((pagesize/2)-border) - (imgsize/2)

Make sense?

The borders "à la" CSS, would require more work but the gravity function could be written for future support of four borders width.

For reference, the ImageMagick GravityAdjustGeometry() function:

By josch on 2018-12-04T07:32:37.437Z

Whatever you implement, your pull request must contain tests!

Have a look at git/img2pdf/src/tests/__init__.py where at the top you can find an extensive test of nearly all possible combinations of the layout options. Naturally, none of these tests must be broken by your changes (check with setup.py test) but you also have to add a sufficient number of your own tests if you implement the gravity function.

By Alba on 2019-01-24T02:56:38.400Z

I'm also in search of this! I modified the code so that the layout function can also (optionally) return imgxpdf and imgypdf, and it will be used instead of being calculated to be centered. This change allows one to implement weird positioning as needed by using img2pdf as a library. And it doesn't break compatibility.

--- a/src/img2pdf.py
+++ b/src/img2pdf.py
@@ -1827,9 +1827,12 @@ def convert(*images, **kwargs):
             depth,
             rotation,
         ) in read_images(rawdata, kwargs["colorspace"], kwargs["first_frame_only"]):
-            pagewidth, pageheight, imgwidthpdf, imgheightpdf = kwargs["layout_fun"](
-                imgwidthpx, imgheightpx, ndpi
-            )
+            layout = kwargs["layout_fun"](imgwidthpx, imgheightpx, ndpi)
+            pagewidth, pageheight, imgwidthpdf, imgheightpdf = layout[:4]
+
+            # the image is by default centered on the page
+            imgxpdf = layout[4] if 4 < len(layout) else (pagewidth - imgwidthpdf) / 2.0
+            imgypdf = layout[5] if 5 < len(layout) else (pageheight - imgheightpdf) / 2.0
 
             userunit = None
             if pagewidth < 3.00 or pageheight < 3.00:
@@ -1843,13 +1846,12 @@ def convert(*images, **kwargs):
                     pageheight /= userunit
                     imgwidthpdf /= userunit
                     imgheightpdf /= userunit
+                    imgxpdf /= userunit
+                    imgypdf /= userunit
                 else:
                     raise PdfTooLargeError(
                         "pdf width or height must not exceed 200 inches."
                     )
-            # the image is always centered on the page
-            imgxpdf = (pagewidth - imgwidthpdf) / 2.0
-            imgypdf = (pageheight - imgheightpdf) / 2.0
             pdf.add_imagepage(
                 color,
                 imgwidthpx,

If an option is desired, the next change would be to modify get_layout_fun so it handles asymmetric borders.

By josch on 2020-04-05T13:25:03.461Z

Hi @mildsunrise sorry for taking so long to come back to this. The change you propose would indeed not break anything. Do you still need this functionality? Is the interface useful to you the way you propose it? If you send a merge request or a full git format-patch or just tell me the name and email you would like to be attributed by, then I can include your changes. Thanks!

*By Ben on 2018-12-02T21:47:11.928Z* Scenario: * Printing a letter image (8.5inx11in) on a legal (8.5inx14in) page format to get extra space for handwriting notes. Actual result: * The image is positioned in the center, splitting the extra space. Expect option * Position the image using a kind of "--gavity" option "a la" ImageMagick. Example: `mg2pdf --output out.pdf --pagesize Legal --imgposition North *.jpg` *** **ImageMagick > Image Positioning using Gravity** ``` Here is an example of using gravity to position images on a background. composite label:Default rings.jpg gravity_default.jpg composite label:Center -gravity center rings.jpg gravity_center.jpg composite label:South -gravity south rings.jpg gravity_south.jpg composite label:East -gravity east rings.jpg gravity_east.jpg composite label:NorthEast -gravity northeast rings.jpg gravity_northeast.jpg ``` So --imgposition should support: * NorthWest * North * NorthEast * West * Center * East * SouthWest * South * SouthEast http://www.imagemagick.org/Usage/annotating/#gravity http://www.imagemagick.org/script/command-line-options.php?#gravity --- *By josch on 2018-12-02T22:50:34.484Z* --- img2pdf is primarily supposed to be a simple converter from image to pdf. It is not supposed to be a competitor to powerful tools like imagemagick or other image manipulation tools. As it is, img2pdf already contains a number of convenience options that go far beyond the capabilities of a "simple converter". As by the UNIX philosophy each tool should only do one thing and do that one thing well. Thus I recommend that you either edit your image with a tool like ImageMagick in a way that you described in your initial post before sending it to img2pdf or you use another tool like LaTeX to convert a Letter-sized PDF into a Legal-sized PDF with the border that you want it to have. --- *By Ben on 2018-12-03T01:38:41.235Z* --- I completely agree with you about the importance of sticking to the UNIX philosophy. img2pdf is fast / light and get the job done perfectly based on my preliminary tests, that IS the reason why I want to use it over other solutions. Also, I think that --imgposition would respect the "rules" governing the UNIX philosophy. The proposition is based on the following logic : * The image is centered by default. That impliy that calculations are done to position the image inside the borders, therefor the function, conceptually at least, already exists. I was so convinced, that I review quickly the source code to check if the option wasn't buried somewhere and not documented. From that angle, the --imgposition option would only allow the user / developer to customize the behavior of a function that already exists (Rule of Least Surprise?). * Using ImageMagick or Latext would add an extra dependency to projects that are using img2pdf and would imply more intensive resources usage. We must consider cloud deployment or mobile applications, where memory and CPU can be limited. (Rule of Parsimony?). * The extra code and commands required to position the image could be needed in multiple projects by multiple developers / users, while --imgposition would only take an extra option each time, saving time (Rule of Economy?). * Not all scanned materials respect "standard" formats. In the education sector for example, PDF are created based on book excerpts, books that come in all formats. The "black padding" resulting from the scanning is usually removed (ink saving) and the ratio is kept. Consequentially, images are often smaller than the page size. Getting the image in the center isn't always optimal for students. (Rule of Diversity?). In summary, not having this option could create, in some circumstances, unintended and negative "externalities" to other users / developers that would be quite "inexpensive" to fix right away in img2pdf... --- *By josch on 2018-12-03T06:51:24.082Z* --- Thanks for your extensive reasoning about this issue. I think I have a solution that could make both of us happy. Instead of adding yet another option and thus increasing the complexity of img2pdf command line arguments, the same functionality could also be implemented by improving the `--border` option. That option so far only allows either one or two values to specify border size. If it would also allow four values, then it could be used to position the image on the page in the way that you requested like so: img2pdf --output out.pdf --pagesize Legal --border 0:0:3in *.jpg The order of border values could be the same as used in the CSS `border-size` property which might be familiar to some: - When one value is specified, it applies the same width to all four sides. - When two values are specified, the first width applies to the top and bottom, the second to the left and right. - When three values are specified, the first width applies to the top, the second to the left and right, the third to the bottom. - When four values are specified, the widths apply to the top, right, bottom, and left in that order (clockwise). The behaviour with one and two values is already what img2pdf is doing right now. So the example command I gave above would put the image on a Legal sized paper such that there are 3 inches of space at the bottom of the page. --- *By Ben on 2018-12-03T08:08:10.224Z* --- If I understand your proposition right, it would work well in the case of images with a known and standardized size for each batch of images. But what about book excerpts with variable sizes, images that we want to be aligned vertically at the top and horizontally at the center of the page in a single pdf file. Each image position would need to be adjusted using the borders? I've bring the ImageMagick example because I find elegant the concept of "gravity". You calculate the remaining area after applying the borders, apply the "fit" function to the image and apply the gravity as requested. It allows to automate the positioning of images with random sizes and keep option to a bare minimum. That probably respects the philosophy behind img2pdf. Now, building on existing option parameters, why not using the circumflex to our advantage... ``` img2pdf --output out.pdf --pagesize Legal --border 0:0:3in^North *.jpg ``` or more CSS style: ``` img2pdf --output out.pdf --pagesize Legal --border 0:0:3in^Top:Center *.jpg ``` That would generate a page with the image on a Legal sized paper such that there are 3 inches of space at the bottom of the page `and the image is top / center`. That would be quite elegant and powerful! --- *By josch on 2018-12-03T12:06:00.995Z* --- Would your specific use-case (letter -> legal) be covered by extending the `--border` option? Or are you also using img2pdf on scans from books as you describe? --- *By Ben on 2018-12-03T15:39:37.841Z* --- Yes and yes. Book excerpts and receipts (some scanned, faxed or coming from smartphone cameras and cropped by the user), all end up with some random sizes. Something like `--border 0:0:3in^Top:Center` would cover all the scenarios the pipelines I'm working on must handle. Also, your idea of a more powerful "border" option would indeed be handy, for example on papers that could have standardized pre-printed contents, like a letterhead with contents on three sides, that would allow to create "protected area" quite easily and push the image directly where it should go... very nice! --- *By josch on 2018-12-03T20:11:39.307Z* --- Hrm... I guess I could be persuaded. I have to sleep over this. Though I fear I will not have time to implement this myself. And I also lack motivation because I personally have no need for this feature. But pull requests are welcome! --- *By Ben on 2018-12-04T04:58:34.384Z* --- Great! That was probably the hardest part ;-) I'm no Python programmer but I will see what I can do. After a very quick review: ``` # the image is always centered on the page imgxpdf = (pagewidth - imgwidthpdf)/2.0 imgypdf = (pageheight - imgheightpdf)/2.0 <snip> text = ("q\n%0.4f 0 0 %0.4f %0.4f %0.4f cm\n/Im0 Do\nQ" % (imgwidthpdf, imgheightpdf, imgxpdf, imgypdf)).encode("ascii") ``` We know / have the following: * Coordinates 0,0 is the top/left corner of the page * Page size * Image size * Border(s) width * Coordinates of the center of the page, border(s) corrected. Right? If so, we could apply an offset based on the center coordinates to get a gravity function. The +/-offset(s) (x, y or x/y) would be the space(s) length between the image edge(s) and inside edge(s) of border(s) in the direction of the "gravity", if greater than 0. The offset length would be `((pagesize/2)-border) - (imgsize/2)` Make sense? The borders "à la" CSS, would require more work but the gravity function could be written for future support of four borders width. *** For reference, the ImageMagick `GravityAdjustGeometry()` function: * https://github.com/ImageMagick/ImageMagick/blob/master/MagickCore/geometry.c * https://www.imagemagick.org/include/api/geometry.php#GravityAdjustGeometry --- *By josch on 2018-12-04T07:32:37.437Z* --- Whatever you implement, your pull request *must* contain tests! Have a look at `git/img2pdf/src/tests/__init__.py` where at the top you can find an extensive test of nearly all possible combinations of the layout options. Naturally, none of these tests must be broken by your changes (check with `setup.py test`) but you also have to add a sufficient number of your own tests if you implement the gravity function. --- *By Alba on 2019-01-24T02:56:38.400Z* --- I'm also in search of this! I modified the code so that the layout function can also (optionally) return `imgxpdf` and `imgypdf`, and it will be used instead of being calculated to be centered. This change allows one to implement weird positioning as needed by using `img2pdf` as a library. And it doesn't break compatibility. ~~~diff --- a/src/img2pdf.py +++ b/src/img2pdf.py @@ -1827,9 +1827,12 @@ def convert(*images, **kwargs): depth, rotation, ) in read_images(rawdata, kwargs["colorspace"], kwargs["first_frame_only"]): - pagewidth, pageheight, imgwidthpdf, imgheightpdf = kwargs["layout_fun"]( - imgwidthpx, imgheightpx, ndpi - ) + layout = kwargs["layout_fun"](imgwidthpx, imgheightpx, ndpi) + pagewidth, pageheight, imgwidthpdf, imgheightpdf = layout[:4] + + # the image is by default centered on the page + imgxpdf = layout[4] if 4 < len(layout) else (pagewidth - imgwidthpdf) / 2.0 + imgypdf = layout[5] if 5 < len(layout) else (pageheight - imgheightpdf) / 2.0 userunit = None if pagewidth < 3.00 or pageheight < 3.00: @@ -1843,13 +1846,12 @@ def convert(*images, **kwargs): pageheight /= userunit imgwidthpdf /= userunit imgheightpdf /= userunit + imgxpdf /= userunit + imgypdf /= userunit else: raise PdfTooLargeError( "pdf width or height must not exceed 200 inches." ) - # the image is always centered on the page - imgxpdf = (pagewidth - imgwidthpdf) / 2.0 - imgypdf = (pageheight - imgheightpdf) / 2.0 pdf.add_imagepage( color, imgwidthpx, ~~~ If an option is desired, the next change would be to modify `get_layout_fun` so it handles asymmetric borders. --- *By josch on 2020-04-05T13:25:03.461Z* --- Hi @mildsunrise sorry for taking so long to come back to this. The change you propose would indeed not break anything. Do you still need this functionality? Is the interface useful to you the way you propose it? If you send a merge request or a full git format-patch or just tell me the name and email you would like to be attributed by, then I can include your changes. Thanks!

josch commented

2022-04-11 06:24:58 +00:00

Author

Owner

Closing because of inactivity.