PDF timestamp handling issue #155
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
By default, it seems that img2pdf treats the "current" time as UTC time without taking the configured system timezone into account. A workaround is to use the
--creationdate
and--moddate
flags and pass timespecs to those that include the timezone info.The img2pdf version is 0.4.4, running on Arch Linux.
My test script:
And its output on my system:
(I was the user (phmccarty) that created this issue, but for some reason my account was deleted... Hoping that it will stay around after creating a fresh account today.)
Interestingly, re-running my test script on Arch Linux with Python 3.11.3, I see different results, so behavior must have changed between Python 3.10 and 3.11.
Current results (without my MR 168) are below. Notice that the
--creationdate
and--moddate
flags are no longer taking effect as a workaround.Regardless, the changes in MR 168 appear to help, at least on Arch Linux, the only distro I've tested. Hopefully that approach works for older Python versions and on other distros/platforms.
Sorry for that. There are hundreds of bots per month creating multiple accounts and spamming my gitea installation with several repos, issues and comments to post their spam per day. I have an automated script that uses some heuristics to find these bots and that script must've wrongly flagged your account and removed it. I'm sorry. Thank you for coming back! I put your username on the allow-list so that this cannot happen again in the future.
Thanks for checking in on this!
I'm definitely coming back to this issue before the next img2pdf release. First though I have to understand the problem better because my brain long discarded any knowledge I used to have about date/time representations in PDF. From what you write this sounds indeed like a bug in img2pdf that needs fixing.
I poked around a little bit more. I saw that you implemented some more code to represent the timezone offset properly. Would it not be easier to just store the timestamp in UTC instead of the local time? This would only inconvenience people reading the PDF in a text editor as any viewer would be able to display the timestamp in local time.
This would first make
datetime.now()
aware of the local timezone and then convert that datetime to utc.I'm still having problems with this though with evince which which is not showing the correct time no matter what I do. This shows another problem of implementing this: you currently rely on the
pdfinfo
tool doing the right thing with respect to timezones. Is there another tools doing the same thing so that we can check that the implementation is indeed correct and maybe evince is buggy?This works for me both with pdfinfo as well as with evince:
Can you confirm?
Your patch works for the default case, but it does not work when specifying
--creationdate
and/or--moddate
strings that have non-UTC timezones.The output of my script with your patch:
Edit: In order to correct the
--creationdate
and--moddate
offsets, you would need to convert those datetime objects to UTC as well before callingdatetime_to_pdfdate
anddatetime_to_xmpdate
.Thanks for the hint about
astimezone()
method, by the way; I had somehow missed it while researching for a solution to this issue...I originally wanted to add the timezone info to
now()
, whichastimezone()
is appropriate for. Not knowing aboutastimezone()
originally, I then opted to convertnow()
time to UTC.I agree that it would be easier to store the UTC timestamps by using the hardcoded offset in the string, as you did in your patch, as long as
--creationdate
and--moddate
strings are converted before that point (see my edited previous comment). Not needing to extract the offset substring from the%z
specifier would certainly be simpler to understand in the code.I'm wondering if we are interpreting evince's document properties the same... With my merge request applied, I checked evince, and I see that it printed the timestamps according to the "raw" offset (not converted to the local timezone). But I see that evince reports the correct times with respect to the timezone offset from the date string.
Note that, by default,
pdfinfo
will display the date timestamps according to the local timezone, but you can use its-rawdates
flag to view the raw format.Thank you!
I also added a bunch of test cases for this in
b25429a4c1
so that this problem doesn't happen again.Could you try running
pytest
with the latest git HEAD to see if the tests succeed for you as well?The new tests pass for me, though I had to apply a patch to override the default value for
--pdfa
so that img2pdf can find the expected color profile. This is on Arch Linux, and thatsRGB.icc
file is provided by thecolord
package.Thank you for finding this! But this is the wrong fix. The file
/usr/share/color/icc/colord/sRGB.icc
also exists in Debian but is different from/usr/share/color/icc/sRGB.icc
(they come from different source packages with different licenses). I think the correct solution is either:--pdfa
option try both locations and choose the first that exist by default orI discovered that
/usr/share/color/icc/sRGB.icc
is provided byicc-profiles-free
in Debian, and that file appears to be sourced from the upstream openicc project, if I am reading the packaging metadata correctly.In the Arch Linux AUR, there is an openicc package, but the sRGB.icc is installed to
/usr/share/color/icc/OpenICC/sRGB.icc
instead.I checked Fedora too, and it provides an icc-profiles-openicc package that installs
/usr/share/color/icc/OpenICC/sRGB.icc
as well.nice, thank you for that research!
Then I propose to check for the existance of profiles in the following order and use the first one that exists as the default:
/usr/share/color/icc/sRGB.icc
/usr/share/color/icc/OpenICC/sRGB.icc
/usr/share/color/icc/colord/sRGB.icc
@phmccarty would this patch fix your problem:
@josch Yep, that patch works for me, thanks!
nice! fixed in
29921eeabd
please file issues for any other problem you find
thanks!