Encoding error when title contains a curved apostrophe character #38

Closed
opened 2021-04-25 19:57:59 +00:00 by josch · 0 comments
Owner

By Ryan C. Thompson on 2017-11-14T15:49:32.574Z

When I run:

img2pdf -o /tmp/temp.pdf --title 'That’s a funny looking apostrophe' temp.jpg

I get:

ERROR:root:error: 'latin-1' codec can't encode character '\u2019' in position 113: ordinal not in range(256)

Removing the curved apostrophe character from the title eliminates the problem. I'm using img2pdf 0.2.4 on Python 3.5.2.


By josch on 2017-11-14T15:53:13.695Z


I cannot reproduce your finding. Please provide more information about your system.


By Ryan C. Thompson on 2017-11-14T15:58:45.025Z


I think I've found the cause of the error. After looking at the img2pdf code, I realized that the metadata encoding was actually being done by another module, pdfrw. After updating pdfrw from 0.3 to 0.4, the error went away. Checking the changelog for pdfrw 0.4, I see "Proper unicode support for text strings in PDFs added", so it seems this was indeed fixed in that version.

I had previously done pip install -U img2pdf to make sure I had the latest version, and it updated both img2pdf and several dependencies (e.g. Pillow). However, for some reason this command did now update pdfrw, maybe because it is an optional dependency. Hence, even after updating, I still had the old pdfrw with the Unicode bug until I explicitly updated it.


By Ryan C. Thompson on 2017-11-14T15:58:45.073Z


Status changed to closed


By josch on 2017-11-14T16:01:44.351Z


pdfrw is indeed optional. You can disable it by passing --without-pdfrw

*By Ryan C. Thompson on 2017-11-14T15:49:32.574Z* When I run: img2pdf -o /tmp/temp.pdf --title 'That’s a funny looking apostrophe' temp.jpg I get: ERROR:root:error: 'latin-1' codec can't encode character '\u2019' in position 113: ordinal not in range(256) Removing the curved apostrophe character from the title eliminates the problem. I'm using img2pdf 0.2.4 on Python 3.5.2. --- *By josch on 2017-11-14T15:53:13.695Z* --- I cannot reproduce your finding. Please provide more information about your system. --- *By Ryan C. Thompson on 2017-11-14T15:58:45.025Z* --- I think I've found the cause of the error. After looking at the img2pdf code, I realized that the metadata encoding was actually being done by another module, pdfrw. After updating pdfrw from 0.3 to 0.4, the error went away. Checking the changelog for pdfrw 0.4, I see "Proper unicode support for text strings in PDFs added", so it seems this was indeed fixed in that version. I had previously done `pip install -U img2pdf` to make sure I had the latest version, and it updated both img2pdf and several dependencies (e.g. Pillow). However, for some reason this command did now update pdfrw, maybe because it is an optional dependency. Hence, even after updating, I still had the old pdfrw with the Unicode bug until I explicitly updated it. --- *By Ryan C. Thompson on 2017-11-14T15:58:45.073Z* --- Status changed to closed --- *By josch on 2017-11-14T16:01:44.351Z* --- pdfrw is indeed optional. You can disable it by passing `--without-pdfrw`
josch closed this issue 2021-04-25 19:57:59 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: josch/img2pdf#38
No description provided.