I've been trying to study the code of img2pdf a bit, and am surprised by the function find_scale() to determine a UserUnit for oversized pages:
deffind_scale(pagewidth,pageheight):"""Find the power of 10 (10, 100, 1000...) that will reduce the scale
below the PDF specification limit of 14400 PDF units (=200 inches)"""frommathimportlog10,ceilmajor=max(pagewidth,pageheight)oversized=major/14400.0return10**ceil(log10(oversized))
I was wondering why you are only using powers of 10? I've been skimming the PDF 1.7 Manual and am unable to find a statement that would require this. Table 30 on page 79 only says:
Key
Type
Value
UserUnit
number
(Optional; PDF 1.6) A positive number that shall give the size of default user space units, in multiples of 1 ⁄ 72 inch. The range of supported values shall be implementation-dependent. Default value: 1.0 (user space unit is 1 ⁄ 72 inch).
This makes me think that UserUnit may basically be any positive int or float, as far as the PDF reference is concerned. I'd like to know if there is a specific reason why you are using only powers of 10? Are there any PDF libraries or apps you know of which strictly require this condition?
Thanks!
I've been trying to study the code of `img2pdf` a bit, and am surprised by the function [`find_scale()`](https://gitlab.mister-muffin.de/josch/img2pdf/src/commit/3da370d3bd4a04822bf0a7611a55ab7eb0beff48/src/img2pdf.py#L2148) to determine a UserUnit for oversized pages:
```python3
def find_scale(pagewidth, pageheight):
"""Find the power of 10 (10, 100, 1000...) that will reduce the scale
below the PDF specification limit of 14400 PDF units (=200 inches)"""
from math import log10, ceil
major = max(pagewidth, pageheight)
oversized = major / 14400.0
return 10 ** ceil(log10(oversized))
```
(This was implemented with commit a826939)
I was wondering why you are only using powers of 10? I've been skimming the PDF 1.7 Manual and am unable to find a statement that would require this. Table 30 on page 79 only says:
| Key | Type | Value |
|-----|------|-------|
| UserUnit | number | (Optional; PDF 1.6) A positive number that shall give the size of default user space units, in multiples of 1 ⁄ 72 inch. The range of supported values shall be implementation-dependent. Default value: 1.0 (user space unit is 1 ⁄ 72 inch). |
This makes me think that UserUnit may basically be any positive int or float, as far as the PDF reference is concerned. I'd like to know if there is a specific reason why you are using only powers of 10? Are there any PDF libraries or apps you know of which strictly require this condition?
Thanks!
As you can see from the commit, it wasn't me but @j_barlow who authored a826939.
As far as my understanding goes you are correct: we could use any multiplier. I think there is a simple reason to choose powers of 10: it makes reading the PDF source (not the rendering) easier for a human because all numbers in the PDF format are expressed as base-10.
Is there any reason you ask?
As you can see from the commit, it wasn't me but @j_barlow who authored a826939.
As far as my understanding goes you are correct: we could use any multiplier. I think there is a simple reason to choose powers of 10: it makes reading the PDF source (not the rendering) easier for a human because all numbers in the PDF format are expressed as base-10.
Is there any reason you ask?
Thanks for the clarification. I see it is useful to use powers of 10 so the original digits can be kept and only the comma shifts, that makes sense.
(The reason I ask is that I have some other code that deals with UserUnit, so I was wondering if it always has to be a power of 10.)
Thanks for the clarification. I see it is useful to use powers of 10 so the original digits can be kept and only the comma shifts, that makes sense.
(The reason I ask is that I have some other code that deals with UserUnit, so I was wondering if it always has to be a power of 10.)
Thanks for bringing this up. In case you are not the last one to read the img2pdf source to make sense of how PDF works I recorded the result of this issue in a code comment 7c48bfb868
Thanks for bringing this up. In case you are not the last one to read the img2pdf source to make sense of how PDF works I recorded the result of this issue in a code comment 7c48bfb86813bd0ede3fae2d0873e1efd022b79a
I've been trying to study the code of
img2pdf
a bit, and am surprised by the functionfind_scale()
to determine a UserUnit for oversized pages:(This was implemented with commit
a826939
)I was wondering why you are only using powers of 10? I've been skimming the PDF 1.7 Manual and am unable to find a statement that would require this. Table 30 on page 79 only says:
This makes me think that UserUnit may basically be any positive int or float, as far as the PDF reference is concerned. I'd like to know if there is a specific reason why you are using only powers of 10? Are there any PDF libraries or apps you know of which strictly require this condition?
Thanks!
As you can see from the commit, it wasn't me but @j_barlow who authored
a826939
.As far as my understanding goes you are correct: we could use any multiplier. I think there is a simple reason to choose powers of 10: it makes reading the PDF source (not the rendering) easier for a human because all numbers in the PDF format are expressed as base-10.
Is there any reason you ask?
What @josch wrote is correct in all respects.
Thanks for the clarification. I see it is useful to use powers of 10 so the original digits can be kept and only the comma shifts, that makes sense.
(The reason I ask is that I have some other code that deals with UserUnit, so I was wondering if it always has to be a power of 10.)
Thanks for bringing this up. In case you are not the last one to read the img2pdf source to make sense of how PDF works I recorded the result of this issue in a code comment
7c48bfb868
Great, thanks!