PDF/A support #72

Closed
opened 2021-04-25 19:58:40 +00:00 by josch · 0 comments
Owner

By kexec on 2020-04-20T03:29:48.683Z

Please, would it be possible to add support of output in PDF/A? Thank you very much.


By josch on 2020-04-20T05:51:21.121Z


In what way is the output of img2pdf currently not conforming to PDF/A?


By kexec on 2020-05-06T16:59:35.872Z


Well, a PDF/A document should claim that its "PDF/A-compliant" in metadata. Also when a PDF/A file is opened in Adobe Acrobat, it shows notice "The file you have opened complies with the PDF/A standard and has been opened read-only to prevent modification."


By josch on 2020-05-06T18:01:13.929Z


I would be greatful for any documentation how that is done. All I can find online are either proprietary tools or online services that want my data first. I'm not going to pay with my private money to download the ISO 19005-1:2005 document that explains how pdf/a works. You also only mention Adobe Acrobat which is another proprietary program that I'm not going to install or pay for. I'm not a company making profit out of this.


By josch on 2020-05-26T09:41:45.159Z


Until this is fixed, you can probably use the following command to turn any pdf into one that is pdf/a compliant:

gs -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -dSAFER -sColorConversionStrategy=UseDeviceIndependentColor -dEmbedAllFonts=true -dPrinted=true -dPDFA -sProcessColorModel=DeviceRGB -dPDFACompatibilityPolicy=1 -dDetectDuplicateImages -r150 -dFastWebView=true -sOutputFile=output.pdf input.pdf

And the following java code uses preflight from apache pdfbox which claims to being able to check pdf/a compliance:

/**
 * Source:
 * https://pdfbox.apache.org/1.8/cookbook/pdfavalidation.html
 *
 * Compilation:
 * $ javac -cp preflight-app.2.0.2.jar:. Validator.java
 *
 * Usage:
 * $ java -cp preflight-app.2.0.2.jar:. Validator input.pdf
 */

import org.apache.pdfbox.preflight.ValidationResult;
import org.apache.pdfbox.preflight.parser.PreflightParser;
import org.apache.pdfbox.preflight.PreflightDocument;
import org.apache.pdfbox.preflight.exception.SyntaxValidationException;
import org.apache.pdfbox.preflight.ValidationResult.ValidationError;

public class Validator {
    public static void main( String[] args ) throws Exception {
        ValidationResult result = null;
        PreflightParser parser = new PreflightParser(args[0]);

        try {
            parser.parse();
            PreflightDocument document = parser.getPreflightDocument();
            document.validate();
            result = document.getResult();
            document.close();
        } catch (SyntaxValidationException e) {
            result = e.getResult();
        }

        if( result.isValid() )
            System.out.println("The file " + args[0] + " is a valid PDF/A-1b file");
        else {
            System.out.println("The file " + args[0] + " is not valid, error(s) :");
            for( ValidationError error : result.getErrorsList() )
                System.out.println(error.getErrorCode() + " : " + error.getDetails());
        }
    }
}

This issue might fix itself when we add the new qpdf backend from !6


By josch on 2020-06-01T00:12:47.858Z


Commit 93f65a49c9 now introduces support for PDF/A-1b. All validators I could find agree that the resulting pdf with the --pdfa option are compliant. If you find a validator that claims otherwise, please tell. Implementing this convinced me even more how useless it is to output PDF/A-1b compliant files for img2pdf. It just unnecessarily increases the filesize. That's why PDF/A-1b compliant output is disabled by default and you need to pass the --pdfa flag to produce such files.


By josch on 2020-06-01T00:12:47.945Z


Status changed to closed

*By kexec on 2020-04-20T03:29:48.683Z* Please, would it be possible to add support of output in PDF/A? Thank you very much. --- *By josch on 2020-04-20T05:51:21.121Z* --- In what way is the output of img2pdf currently not conforming to PDF/A? --- *By kexec on 2020-05-06T16:59:35.872Z* --- Well, a PDF/A document should claim that its "PDF/A-compliant" in metadata. Also when a PDF/A file is opened in Adobe Acrobat, it shows notice "The file you have opened complies with the PDF/A standard and has been opened read-only to prevent modification." --- *By josch on 2020-05-06T18:01:13.929Z* --- I would be greatful for any documentation how that is done. All I can find online are either proprietary tools or online services that want my data first. I'm not going to pay with my private money to download the ISO 19005-1:2005 document that explains how pdf/a works. You also only mention Adobe Acrobat which is another proprietary program that I'm not going to install or pay for. I'm not a company making profit out of this. --- *By josch on 2020-05-26T09:41:45.159Z* --- Until this is fixed, you can probably use the following command to turn any pdf into one that is pdf/a compliant: gs -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -dSAFER -sColorConversionStrategy=UseDeviceIndependentColor -dEmbedAllFonts=true -dPrinted=true -dPDFA -sProcessColorModel=DeviceRGB -dPDFACompatibilityPolicy=1 -dDetectDuplicateImages -r150 -dFastWebView=true -sOutputFile=output.pdf input.pdf And the following java code uses preflight from apache pdfbox which claims to being able to check pdf/a compliance: ``` /** * Source: * https://pdfbox.apache.org/1.8/cookbook/pdfavalidation.html * * Compilation: * $ javac -cp preflight-app.2.0.2.jar:. Validator.java * * Usage: * $ java -cp preflight-app.2.0.2.jar:. Validator input.pdf */ import org.apache.pdfbox.preflight.ValidationResult; import org.apache.pdfbox.preflight.parser.PreflightParser; import org.apache.pdfbox.preflight.PreflightDocument; import org.apache.pdfbox.preflight.exception.SyntaxValidationException; import org.apache.pdfbox.preflight.ValidationResult.ValidationError; public class Validator { public static void main( String[] args ) throws Exception { ValidationResult result = null; PreflightParser parser = new PreflightParser(args[0]); try { parser.parse(); PreflightDocument document = parser.getPreflightDocument(); document.validate(); result = document.getResult(); document.close(); } catch (SyntaxValidationException e) { result = e.getResult(); } if( result.isValid() ) System.out.println("The file " + args[0] + " is a valid PDF/A-1b file"); else { System.out.println("The file " + args[0] + " is not valid, error(s) :"); for( ValidationError error : result.getErrorsList() ) System.out.println(error.getErrorCode() + " : " + error.getDetails()); } } } ``` This issue might fix itself when we add the new qpdf backend from !6 --- *By josch on 2020-06-01T00:12:47.858Z* --- Commit 93f65a49c9b1835235e90750af8e6f30b52c1592 now introduces support for PDF/A-1b. All validators I could find agree that the resulting pdf with the --pdfa option are compliant. If you find a validator that claims otherwise, please tell. Implementing this convinced me even more how useless it is to output PDF/A-1b compliant files for img2pdf. It just unnecessarily increases the filesize. That's why PDF/A-1b compliant output is disabled by default and you need to pass the --pdfa flag to produce such files. --- *By josch on 2020-06-01T00:12:47.945Z* --- Status changed to closed
josch closed this issue 2021-04-25 19:58:41 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: josch/img2pdf#72
No description provided.