Encoding error when title contains a curved apostrophe character #38

New issue

Closed

opened 2021-04-25 19:57:59 +00:00 by josch · 0 comments

josch commented

2021-04-25 19:57:59 +00:00

Owner

By Ryan C. Thompson on 2017-11-14T15:49:32.574Z

When I run:

img2pdf -o /tmp/temp.pdf --title 'That’s a funny looking apostrophe' temp.jpg

I get:

ERROR:root:error: 'latin-1' codec can't encode character '\u2019' in position 113: ordinal not in range(256)

Removing the curved apostrophe character from the title eliminates the problem. I'm using img2pdf 0.2.4 on Python 3.5.2.

By josch on 2017-11-14T15:53:13.695Z

I cannot reproduce your finding. Please provide more information about your system.

By Ryan C. Thompson on 2017-11-14T15:58:45.025Z

I think I've found the cause of the error. After looking at the img2pdf code, I realized that the metadata encoding was actually being done by another module, pdfrw. After updating pdfrw from 0.3 to 0.4, the error went away. Checking the changelog for pdfrw 0.4, I see "Proper unicode support for text strings in PDFs added", so it seems this was indeed fixed in that version.

I had previously done pip install -U img2pdf to make sure I had the latest version, and it updated both img2pdf and several dependencies (e.g. Pillow). However, for some reason this command did now update pdfrw, maybe because it is an optional dependency. Hence, even after updating, I still had the old pdfrw with the Unicode bug until I explicitly updated it.

By Ryan C. Thompson on 2017-11-14T15:58:45.073Z

Status changed to closed

By josch on 2017-11-14T16:01:44.351Z

pdfrw is indeed optional. You can disable it by passing --without-pdfrw

*By Ryan C. Thompson on 2017-11-14T15:49:32.574Z* When I run: img2pdf -o /tmp/temp.pdf --title 'That’s a funny looking apostrophe' temp.jpg I get: ERROR:root:error: 'latin-1' codec can't encode character '\u2019' in position 113: ordinal not in range(256) Removing the curved apostrophe character from the title eliminates the problem. I'm using img2pdf 0.2.4 on Python 3.5.2. --- *By josch on 2017-11-14T15:53:13.695Z* --- I cannot reproduce your finding. Please provide more information about your system. --- *By Ryan C. Thompson on 2017-11-14T15:58:45.025Z* --- I think I've found the cause of the error. After looking at the img2pdf code, I realized that the metadata encoding was actually being done by another module, pdfrw. After updating pdfrw from 0.3 to 0.4, the error went away. Checking the changelog for pdfrw 0.4, I see "Proper unicode support for text strings in PDFs added", so it seems this was indeed fixed in that version. I had previously done `pip install -U img2pdf` to make sure I had the latest version, and it updated both img2pdf and several dependencies (e.g. Pillow). However, for some reason this command did now update pdfrw, maybe because it is an optional dependency. Hence, even after updating, I still had the old pdfrw with the Unicode bug until I explicitly updated it. --- *By Ryan C. Thompson on 2017-11-14T15:58:45.073Z* --- Status changed to closed --- *By josch on 2017-11-14T16:01:44.351Z* --- pdfrw is indeed optional. You can disable it by passing `--without-pdfrw`