support for pdf 1.4 and above #33

Closed
opened 2021-04-25 19:57:53 +00:00 by josch · 0 comments
Owner

By shirish on 2017-04-25T08:37:05.228Z

I am on Debian stretch and using img2pdf 0.2.3.1

[$] apt-cache policy img2pdf                                                                                                      
img2pdf:
  Installed: 0.2.3-1
  Candidate: 0.2.3-1
  Version table:
 *** 0.2.3-1 600
        600 http://httpredir.debian.org/debian stretch/main amd64 Packages
        600 http://httpredir.debian.org/debian stretch/main i386 Packages
          1 http://httpredir.debian.org/debian unstable/main amd64 Packages
          1 http://httpredir.debian.org/debian unstable/main i386 Packages
        100 /var/lib/dpkg/status

I made a pdf file with it and it resulted in a pdf file which showed pdf 1.3 in the metadata. Looking at history of pdf versioning, 1.3 was released in April 1999
https://www.prepressure.com/pdf/basics/history/3 hence lot of water has flowed under the bridge since then. is there a roadmap for support for 1.4 or later, the latest upstream is 1.7 but even 1.4 would be a vast improvement.

You can see some discussion on the same topic at https://unix.stackexchange.com/questions/360860/what-is-the-highest-version-of-pdf-that-free-software-operates-at

Look forward to know.


By josch on 2017-04-25T08:41:05.074Z


I do not understand what the bug that you want to report is. Can you elaborate on that?

The answer on stackexchange already says most of what I was about to say: "Tools producing PDF files should ideally specify the oldest version which provides the features they need, for each file they produce. The PDF you’re producing with img2pdf only uses features available in PDF 1.3 (if that), so it’s correct in specifying version 1.3 in its metadata."

Why, in your use case, is it a problem that the pdf version is 1.3? What benefit would it have for you if the version was higher?


By shirish on 2017-04-25T12:30:53.734Z


I am/was trying to complete https://www.onlineservices.nsdl.com/paam/endUserRegisterContact.html which required two documents in pdf format. They only requested that the pdf document should not be blurry and should be below 300 kb. I did both of that. I uploaded both the documents but it wouldn't be accepted by the system.

This is how I went about it , the original.jpg was shot by my mobile camera -

[$] mediainfo original.jpg                                                                                 
General
Complete name                            : original.jpg
Format                                   : JPEG
File size                                : 2.82 MiB


Image
Format                                   : JPEG
Width                                    : 4 128 pixels
Height                                   : 2 322 pixels
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Compression mode                         : Lossy
Stream size                              : 2.82 MiB (100%)
$ convert original2.jpg -adaptive-resize 1000x1000 original.jpg
[$] ll -h original2.jpg                                                                                                       
-rw-r--r-- 1 shirish shirish 220K 2017-04-24 06:15 original2.jpg
[$] mediainfo original2.jpg                                                                                                  
General
Complete name                            : original2.jpg
Format                                   : JPEG
File size                                : 220 KiB

Image
Format                                   : JPEG
Width                                    : 1 000 pixels
Height                                   : 563 pixels
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Compression mode                         : Lossy
Stream size                              : 220 KiB (100%)
$ img2pdf --output original3.pdf  original2.jpg 

Now at my end I did everything to make sure it's all good it still got rejected, the only reason (I think) it got rejected is the pdf version was too old. I wish there was anything else I could do but seems not.

While I would not be able to share the offending file as those are government identity proofs and hence valuable, I can share other images shot from my phone which can/will illustrate the same thing.


By josch on 2017-04-25T13:19:08.137Z


Okay, but that's not how pdf versions work. Stephen Kitt already explained this to you in the stackexchange question you linked to. Just to repeat the same thing: the pdf version only indicates which feature set is used in the pdf file at hand and these features are backwards compatible. This means that every new pdf version only adds features but doesn't remove them. Thus, software that can read pdf version 1.7 can read all pdf versions before that as well. On the other hand (and hopefully obviously) software that can only read pdf version 1.3 can not read any higher versions. So when img2pdf claims that the pdf is of version 1.3, then this is not a restriction that does not allow newer software to read it. It's the other way round. The pdf being of version 1.3 allows old software to read it. Any newer software supports it automatically.

But now that you explained your problem further it even sounds doubtful whether the pdf version is the problem in the first place. As far as I understand your problem, the web service rejects your pdf without giving you a clear reason. And then you somehow jump to the conclusion that it must be the pdf version that is problematic. But there is no indication for that. I can think of multiple other problems that let the server reject a pdf. The pdf format is a very complicated beast and it's easy to mess up when trying to parse it.

I see two things that you can do:

  • Convert your JPEG to pdf using the convert tool from imagemagick and try to upload that. Does that also get rejected? Because if yes, then the problem is definitely not img2pdf but somewhere on the server instead. In contrast to img2pdf, imagemagick is a mature piece of software used by millions of people. It is highly unlikely that it produces invalid pdf files.

  • Send your PDF to the support address of the webservice in question and tell them that their system rejected the pdf and ask their technical staff to look into it. Maybe they can fix the problem.


By shirish on 2017-04-25T18:38:17.566Z


@josch thank you, this is news to me. I didn't knew/know that imagemagick is also able to make pdf files. Could you share an example so I know what knobs I should try to get a pdf from imagemagick ?

The other part of asking/telling the web service is also good, would do that also. Thank you for taking time to share with an ignorant user. Please close the issue after you have shared the example. Sorry for wasting your time and thank you for explaining as well.


By josch on 2017-04-27T08:07:46.811Z


With imagemagick you do it clike this:

convert input.jpg output.pdf

By josch on 2017-04-27T08:07:47.013Z


Status changed to closed

*By shirish on 2017-04-25T08:37:05.228Z* I am on Debian stretch and using img2pdf 0.2.3.1 ``` [$] apt-cache policy img2pdf img2pdf: Installed: 0.2.3-1 Candidate: 0.2.3-1 Version table: *** 0.2.3-1 600 600 http://httpredir.debian.org/debian stretch/main amd64 Packages 600 http://httpredir.debian.org/debian stretch/main i386 Packages 1 http://httpredir.debian.org/debian unstable/main amd64 Packages 1 http://httpredir.debian.org/debian unstable/main i386 Packages 100 /var/lib/dpkg/status ``` I made a pdf file with it and it resulted in a pdf file which showed pdf 1.3 in the metadata. Looking at history of pdf versioning, 1.3 was released in April 1999 https://www.prepressure.com/pdf/basics/history/3 hence lot of water has flowed under the bridge since then. is there a roadmap for support for 1.4 or later, the latest upstream is 1.7 but even 1.4 would be a vast improvement. You can see some discussion on the same topic at https://unix.stackexchange.com/questions/360860/what-is-the-highest-version-of-pdf-that-free-software-operates-at Look forward to know. --- *By josch on 2017-04-25T08:41:05.074Z* --- I do not understand what the bug that you want to report is. Can you elaborate on that? The answer on stackexchange already says most of what I was about to say: "Tools producing PDF files should ideally specify the oldest version which provides the features they need, for each file they produce. The PDF you’re producing with img2pdf only uses features available in PDF 1.3 (if that), so it’s correct in specifying version 1.3 in its metadata." Why, in your use case, is it a problem that the pdf version is 1.3? What benefit would it have for you if the version was higher? --- *By shirish on 2017-04-25T12:30:53.734Z* --- I am/was trying to complete https://www.onlineservices.nsdl.com/paam/endUserRegisterContact.html which required two documents in pdf format. They only requested that the pdf document should not be blurry and should be below 300 kb. I did both of that. I uploaded both the documents but it wouldn't be accepted by the system. This is how I went about it , the original.jpg was shot by my mobile camera - ``` [$] mediainfo original.jpg General Complete name : original.jpg Format : JPEG File size : 2.82 MiB Image Format : JPEG Width : 4 128 pixels Height : 2 322 pixels Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Compression mode : Lossy Stream size : 2.82 MiB (100%) ``` ``` $ convert original2.jpg -adaptive-resize 1000x1000 original.jpg ``` ``` [$] ll -h original2.jpg -rw-r--r-- 1 shirish shirish 220K 2017-04-24 06:15 original2.jpg ``` ``` [$] mediainfo original2.jpg General Complete name : original2.jpg Format : JPEG File size : 220 KiB Image Format : JPEG Width : 1 000 pixels Height : 563 pixels Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Compression mode : Lossy Stream size : 220 KiB (100%) ``` ``` $ img2pdf --output original3.pdf original2.jpg ``` Now at my end I did everything to make sure it's all good it still got rejected, the only reason (I think) it got rejected is the pdf version was too old. I wish there was anything else I could do but seems not. While I would not be able to share the offending file as those are government identity proofs and hence valuable, I can share other images shot from my phone which can/will illustrate the same thing. --- *By josch on 2017-04-25T13:19:08.137Z* --- Okay, but that's not how pdf versions work. Stephen Kitt already explained this to you in the stackexchange question you linked to. Just to repeat the same thing: the pdf version only indicates which feature set is used in the pdf file at hand and these features are backwards compatible. This means that every new pdf version only *adds* features but doesn't remove them. Thus, software that can read pdf version 1.7 can read all pdf versions before that as well. On the other hand (and hopefully obviously) software that can only read pdf version 1.3 can *not* read any higher versions. So when `img2pdf` claims that the pdf is of version 1.3, then this is *not* a restriction that does not allow newer software to read it. It's the other way round. The pdf being of version 1.3 allows *old* software to read it. Any newer software supports it automatically. But now that you explained your problem further it even sounds doubtful whether the pdf version is the problem in the first place. As far as I understand your problem, the web service rejects your pdf without giving you a clear reason. And then you somehow jump to the conclusion that it must be the pdf version that is problematic. But there is no indication for that. I can think of multiple other problems that let the server reject a pdf. The pdf format is a very complicated beast and it's easy to mess up when trying to parse it. I see two things that you can do: - Convert your JPEG to pdf using the `convert` tool from imagemagick and try to upload that. Does that also get rejected? Because if yes, then the problem is definitely not `img2pdf` but somewhere on the server instead. In contrast to img2pdf, imagemagick is a mature piece of software used by millions of people. It is highly unlikely that it produces invalid pdf files. - Send your PDF to the support address of the webservice in question and tell them that their system rejected the pdf and ask their technical staff to look into it. Maybe they can fix the problem. --- *By shirish on 2017-04-25T18:38:17.566Z* --- @josch thank you, this is news to me. I didn't knew/know that imagemagick is also able to make pdf files. Could you share an example so I know what knobs I should try to get a pdf from imagemagick ? The other part of asking/telling the web service is also good, would do that also. Thank you for taking time to share with an ignorant user. Please close the issue after you have shared the example. Sorry for wasting your time and thank you for explaining as well. --- *By josch on 2017-04-27T08:07:46.811Z* --- With imagemagick you do it clike this: convert input.jpg output.pdf --- *By josch on 2017-04-27T08:07:47.013Z* --- Status changed to closed
josch closed this issue 2021-04-25 19:57:53 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: josch/img2pdf#33
No description provided.