Is there any value to using pdfrw? #74
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
By Michał Górny on 2020-04-20T10:31:15.600Z
I'm trying to figure out whether using pdfrw over built-in PDF writer has any advantages. In fact, both help output and a quick glance at the code suggests that it has at least a few limitations which require using
--without-pdfrw
. However, are there any cases when you should use pdfrw over--without-pdfrw
?In a wider context, I'd like to remove pdfrw from Gentoo since it's dead and stinking (and broken with Python 3.7+) and img2pdf is the last program using it. I'm wondering whether it'd be feasible to just rely on non-pdfrw behavior unconditionally.
By josch on 2020-04-20T13:30:03.745Z
Hi!
The reason for pdfrw inclusion is, that I think it's a good idea to rely on existing functionality instead of re-implementing the world. Writing pdf files is tricky and I would like if somebody else would take care of all the corner cases. Yes, pdfrw cannot do some things that img2pdf needs, so I opened issues about the missing functionality with pdfrw in the hope that those would be fixed in the future and I can then drop the
--without-pdfrw
flag. If pdfrw is really dead as you say (last commit was more than two years ago) then that's really shitty indeed. I am not aware of a good alternative that I could switch to.What you can do in gentoo is to just patch img2pdf so that it unconditionally operates without pdfrw.
What is the problem with python 3.7+? I'm on python 3.8 and didn't find any problems yet.
Thanks for packaging the software in gentoo!
By Michał Górny on 2020-04-20T14:12:09.737Z
I agree that it's better to reuse existing code and let somebody else maintain it. Except that in this case nobody is maintaining pdfrw, so effectively you're either forced to fork it (which could be a good idea if you're willing to maintain it and the code has a low F-factor) or duplicate it. Right now you do the latter, so you're not only have to reinvent the wheel but also maintain two code branches which IMHO doesn't serve your goal much.
What I'm wondering is whether you'd be interested in removing pdfrw support altogether. That would save us the work, and possibly save you some too in the future ;-).
I've fought pdfrw's tests today and from all tests that passed with py2.7, 1 fails with py3.6 and 5 fail with py3.7. The latter are cases of
raise StopIteration
in generator which is forbidden in py3.7. I've found one explicit case in the code that's trivial to fix but it apparently isn't tested at all. The actual failure comes from some indirect call which I can't really figure out, and it seems that PDF files that somehow worked with earlier Python versions start exploding on newer versions, and the code misses proper error handling.I'm not an expert on pdfrw's code and I don't really have the time to fix it or even figure out what's really wrong, I'm afraid. However, the evidence so far suggests it's badly written and making a lot of untested assumptions.
By josch on 2020-04-20T18:42:03.354Z
Did you report the bugs you found in their github issue tracker? I would like to refer to actual problems when I remove pdfrw support.
What I can do for you for now is to make it such that pdfrw support is automatically off if pdfrw is not installed. That should work for you, right?
By Michał Górny on 2020-04-20T20:20:15.449Z
That one happens with all Python versions I've tested: https://github.com/pmaupin/pdfrw/issues/197
This one with py3.6+: https://github.com/pmaupin/pdfrw/issues/198
This one with py3.7+: https://github.com/pmaupin/pdfrw/issues/199
The StopIterator problem has been reported already as https://github.com/pmaupin/pdfrw/issues/145
Yes, that would be very helpful. Also please make sure tests skip it gracefully. Thank you.
By josch on 2020-04-20T21:36:06.946Z
Thank you for your input. I also marked pdfrw for removal in Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=95836
By Michał Górny on 2020-04-21T05:36:22.041Z
Thank you. I presume you meant https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=958362. Hopefully this will get some attention and the package can be revived.
The only consumer we have (besides img2pdf) is rst2pdf that I've queued for removing as needed only to build PDF manual for mpv (no clue why we did that, to be honest).
By josch on 2020-04-23T04:59:11.107Z
Status changed to closed by commit
0bbbc7a31a