Re-evaluating Default DPI and handling DPI-less source images #189
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I write a lot of little programs and the last thing I want is for them to not work by default, or not do what's intended quickly and with the fewest arguments given.
And I know I'm some stranger coming out of nowhere, but I think it might be better to not have a default dpi, than to have a default dpi.
Or alternately (assuming the goal is to produce an accurate pdf) img2pdf should generate a clear visible warning on stderr when used on source files missing dpi information because:
Working with inaccurate data coming in is unfortunately common, even expected. But allowing inaccurate data out, without even a warning, feels wrong.
I have a basically modern scanner that scans at 300dpi. I convert to pdf with img2pdf and all the sudden dimensions are roughly 3x larger than they should be. weird.... but sort of okay... whatever... I can zoom out, scale on print. I'm still super thankful this tool exists at all Josch!
I realize that if the source image doesn't specify a dpi, it may be necessary to use SOMETHING to do the conversion to pdf. But if that something is picked out of the air, it is most likely not correct. Right? So why are we doing that?
e.g.: DPIs range from 50ish to 1200-ish today, so the chance that 96 is correct is about 1/1150. Maybe better than that, because it sounds like a common-ish number. But scanners are usually 150 300 600 1200 and I assume img2pdf is often used with scanned images but perhaps that's my confirmation bias. Presumably the optical element has 1200 sensors per inch and the common options wisely halve that to avoid interpolation. 96 48 32 sounds like icon sizes to me or like sizes for computer generated content.
So I propose that when no dpi is specified and the source image lacks that information, img2pdf should:
Note on #2a: It could be slightly more helpful to tell the user PNM files don't SUPPORT dpi. vs JPG files support it but your files don't have it. (mine don't)
One other idea. Maybe it's possible to produce output pdfs that lack any DPI or size information. That might be slightly better than producing output pdfs that contain wrong information. It would cause less mystery to the user "Where in the heck is 96 coming from?" And it's probably better if whatever consumes the PDFs knows "there is no DPI" vs "this pdf is certain that it's nine feet long"
Re-evaluating Default DPI and how to handle DPI-less source imagesto Re-evaluating Default DPI and handling DPI-less source imagesI think exiting with a non-zero exit status is a bad idea. I got lots of complaints in the past when I insisted that, if the input is garbage, img2pdf should just refuse to work.
But I'm all in favor of showing a big fat warning if the image did not come with dpi values or only broken values and thus a default was used. The warning can then also include instructions of what to do to specify the correct dpi by hand.
I'm afraid that "no dpi" doesn't work because the input image has pixels as its size and the output pdf has a physical size. So we somehow need to decide what physical size the pdf pages should have, based on the number of pixels width and height of the image. That is what dots per inch or pixel per centimeter decide.
It sounds like you can set a pdf's unit to "pixels per inch=x" or "pixels per centimeter=x". Too bad it can't be tricked into "pixels per pixel=1"!
If you are certain there's absolutely no way to generate a pdf without declaring physical (non-pixel) dimensions, I'd default the output pdf to be 8.5x11 (because yay America) Having PDFs that are too large to come out of a printer is unhelpful.
And once you know what output page size you want to hit, scale the image as large as possible to fit within that and centered. Maybe instead of outputting 8.5x11 you putput maximum 8.5 by maximum 11. So the images won't be matted, but are guaranteed to fit on one page.
If you have to assume a DPI, any assumption is probably going to be wrong. So maximizing the utility of the common printable size is probably the next best thing. Even if I'd feel really stupid carrying around a gas pump receipt or a UPS label enlarged to 8.5x11. I've done that before...
I feel like this level of decision making should be left the e user's --args but if they complain when they don't give --args and don't get a pdf out, you're in a tough position.
Some folks like A4, or legal over letter. More relevantly, A4 is the standard for much of the world outside the US. It's hard for me to say whether A4 or Letter are more common in humanity or which are more common amongst people with computers. I do concede the A system is more intelligent. But we have more nukes and moon rocks. so....
Maybe there's a system default paper-size variable you could read, like /etc/timezone but I would be quite surprised.
Big fat warning is also great because it will increase user awareness of poor input data, while preventing the complaints that no output file would lead to.
I'm not an expert in pdf or jpg or pnm. But I switched my scanning from pnm to jpg when I learned that pnm can not contain dpi/dimensions. Only to learn that scanimage doesn't save dimensions in the jpg format either!
What would "pixel per pixel=1" mean?
A pdf is not a raster image. It is more like a vector graphic and its native unit is 1/72 inch. So you have to convert your measurement into inches one way or another. You cannot give measurements in pixels ever. That's not how pdf works.
You are making the assumption that people use img2pdf to print things. I know some people do. I have personally never have used it for that. I think defaulting to some dpi value and printing a warning if that value gets used is the right thing to do.
There is a way to figure out whether the user might prefer letter or A4: by using the locale the user has set. If you want to send a patch that enables this, I'd happily review it.
Yes pnm can store nothing else but pixel data. It does not contain any metadata.
I will not default to the letter format. You can compute the number of people who use letter versus those who use the ISO 216 sizes using this handy graphic:
https://upload.wikimedia.org/wikipedia/commons/1/1a/Prevalent_default_paper_size.svg
I seldom print. But I scan and use img2pdf and ocrmypdf and other tools to preserve paper documents in a reproducible form.
pixel per pixel=1 is me looking for a way to have a pdf without physical dimensions when such dimensions are unknown to begin with. hacking the format. not ideal
I think the Big Fat Warning on stderr when input is without dpi is better though.
I found there IS an /etc/papersize and it's set to (gasp) a4 on my system!
#to set it to letter:
paperconfig -p letter
x360:
# paperconf# paperconf -sletter
x360:
612 792
x360:~# paperconf -s -m
215.9 mm 279.4 mm
There's a libpaper library apparently for this.
supposedly /etc/papersize is used by groff, troff, and cups. LC_PAPER exists, but it's just set to en_US.UTF-8 which would need another table to look up from the country code to a papersize.
The "table lookup" can be done by the
locale
utility. Try running this on your system:It will give you width and height of the default paper format according to your
LC_PAPER
setting in mm. If this doesn't work, img2pdf could fall back to using the information in/etc/papersize
, for example by runningpaperconf -s
.