Encoding error when title contains a curved apostrophe character #38
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
By Ryan C. Thompson on 2017-11-14T15:49:32.574Z
When I run:
I get:
Removing the curved apostrophe character from the title eliminates the problem. I'm using img2pdf 0.2.4 on Python 3.5.2.
By josch on 2017-11-14T15:53:13.695Z
I cannot reproduce your finding. Please provide more information about your system.
By Ryan C. Thompson on 2017-11-14T15:58:45.025Z
I think I've found the cause of the error. After looking at the img2pdf code, I realized that the metadata encoding was actually being done by another module, pdfrw. After updating pdfrw from 0.3 to 0.4, the error went away. Checking the changelog for pdfrw 0.4, I see "Proper unicode support for text strings in PDFs added", so it seems this was indeed fixed in that version.
I had previously done
pip install -U img2pdf
to make sure I had the latest version, and it updated both img2pdf and several dependencies (e.g. Pillow). However, for some reason this command did now update pdfrw, maybe because it is an optional dependency. Hence, even after updating, I still had the old pdfrw with the Unicode bug until I explicitly updated it.By Ryan C. Thompson on 2017-11-14T15:58:45.073Z
Status changed to closed
By josch on 2017-11-14T16:01:44.351Z
pdfrw is indeed optional. You can disable it by passing
--without-pdfrw