Question on UserUnit handling #136

Closed
opened 2022-03-23 20:54:54 +00:00 by mara0004 · 5 comments
Contributor

I've been trying to study the code of img2pdf a bit, and am surprised by the function find_scale() to determine a UserUnit for oversized pages:

def find_scale(pagewidth, pageheight):
    """Find the power of 10 (10, 100, 1000...) that will reduce the scale
    below the PDF specification limit of 14400 PDF units (=200 inches)"""
    from math import log10, ceil

    major = max(pagewidth, pageheight)
    oversized = major / 14400.0

    return 10 ** ceil(log10(oversized))

(This was implemented with commit a826939)

I was wondering why you are only using powers of 10? I've been skimming the PDF 1.7 Manual and am unable to find a statement that would require this. Table 30 on page 79 only says:

Key Type Value
UserUnit number (Optional; PDF 1.6) A positive number that shall give the size of default user space units, in multiples of 1 ⁄ 72 inch. The range of supported values shall be implementation-dependent. Default value: 1.0 (user space unit is 1 ⁄ 72 inch).

This makes me think that UserUnit may basically be any positive int or float, as far as the PDF reference is concerned. I'd like to know if there is a specific reason why you are using only powers of 10? Are there any PDF libraries or apps you know of which strictly require this condition?

Thanks!

I've been trying to study the code of `img2pdf` a bit, and am surprised by the function [`find_scale()`](https://gitlab.mister-muffin.de/josch/img2pdf/src/commit/3da370d3bd4a04822bf0a7611a55ab7eb0beff48/src/img2pdf.py#L2148) to determine a UserUnit for oversized pages: ```python3 def find_scale(pagewidth, pageheight): """Find the power of 10 (10, 100, 1000...) that will reduce the scale below the PDF specification limit of 14400 PDF units (=200 inches)""" from math import log10, ceil major = max(pagewidth, pageheight) oversized = major / 14400.0 return 10 ** ceil(log10(oversized)) ``` (This was implemented with commit a826939) I was wondering why you are only using powers of 10? I've been skimming the PDF 1.7 Manual and am unable to find a statement that would require this. Table 30 on page 79 only says: | Key | Type | Value | |-----|------|-------| | UserUnit | number | (Optional; PDF 1.6) A positive number that shall give the size of default user space units, in multiples of 1 ⁄ 72 inch. The range of supported values shall be implementation-dependent. Default value: 1.0 (user space unit is 1 ⁄ 72 inch). | This makes me think that UserUnit may basically be any positive int or float, as far as the PDF reference is concerned. I'd like to know if there is a specific reason why you are using only powers of 10? Are there any PDF libraries or apps you know of which strictly require this condition? Thanks!
Owner

As you can see from the commit, it wasn't me but @j_barlow who authored a826939.

As far as my understanding goes you are correct: we could use any multiplier. I think there is a simple reason to choose powers of 10: it makes reading the PDF source (not the rendering) easier for a human because all numbers in the PDF format are expressed as base-10.

Is there any reason you ask?

As you can see from the commit, it wasn't me but @j_barlow who authored a826939. As far as my understanding goes you are correct: we could use any multiplier. I think there is a simple reason to choose powers of 10: it makes reading the PDF source (not the rendering) easier for a human because all numbers in the PDF format are expressed as base-10. Is there any reason you ask?

What @josch wrote is correct in all respects.

What @josch wrote is correct in all respects.
Author
Contributor

Thanks for the clarification. I see it is useful to use powers of 10 so the original digits can be kept and only the comma shifts, that makes sense.
(The reason I ask is that I have some other code that deals with UserUnit, so I was wondering if it always has to be a power of 10.)

Thanks for the clarification. I see it is useful to use powers of 10 so the original digits can be kept and only the comma shifts, that makes sense. (The reason I ask is that I have some other code that deals with UserUnit, so I was wondering if it always has to be a power of 10.)
Owner

Thanks for bringing this up. In case you are not the last one to read the img2pdf source to make sense of how PDF works I recorded the result of this issue in a code comment 7c48bfb868

Thanks for bringing this up. In case you are not the last one to read the img2pdf source to make sense of how PDF works I recorded the result of this issue in a code comment 7c48bfb86813bd0ede3fae2d0873e1efd022b79a
Author
Contributor

Great, thanks!

Great, thanks!
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: josch/img2pdf#136
No description provided.