Limited Number of Input Files on Windows due to cmd.exe limitations #95

New issue

Closed

opened 2021-04-25 19:59:07 +00:00 by josch · 0 comments

josch commented

2021-04-25 19:59:07 +00:00

Owner

By Falko Fleßner on 2021-04-19T12:45:43.085Z

When img2pdf.py is called via Windows cmd.exe, the length of the command call and it's parameters is limited to 8k. Due to undefined length of input file names in a automated process, this was a blocker for our operations.

We have developed a patch that adds the parameter "--file". The file given with that must contiain one filename per line to be processed, allowing arbitrary long documents to be created.

The patch may not hold up to img2pdf's standards in regard of documentation, style or comments, but it should integrate reasonably well with the other code. Feel free to use it.

img2pdf.diff

By josch on 2021-04-20T17:47:17.097Z

This is not a bad idea. But if we have one filename per line, then that means that filenames must not contain the newline character. We also run into the problem that different platforms use different ascii characters to denote a newline. To process arbitrary filenames it would make more sense to separate them with a zero byte. What do you think?

By Falko Fleßner on 2021-04-21T10:13:07.667Z

For our usecase, NULL-Terminated strings would be feasible and would also integrate quite well with the "find -print0 ..." command in other usecases.

(Had to look up if CR/LF & friends are legal filename characters. Turns out: they are /o\)

I ran into bugs with my patch and found code that I would like to see cleaner. I can provide you with a new patch that splits on NULL-Bytes and solves my other bugs & issues, too.

*By Falko Fleßner on 2021-04-19T12:45:43.085Z* When img2pdf.py is called via Windows cmd.exe, the length of the command call and it's parameters is limited to 8k. Due to undefined length of input file names in a automated process, this was a blocker for our operations. We have developed a patch that adds the parameter "--file". The file given with that must contiain one filename per line to be processed, allowing arbitrary long documents to be created. The patch may not hold up to img2pdf's standards in regard of documentation, style or comments, but it should integrate reasonably well with the other code. Feel free to use it. [img2pdf.diff](/uploads/b20a20531ab17bafbea167470936cc4c/img2pdf.diff) --- *By josch on 2021-04-20T17:47:17.097Z* --- This is not a bad idea. But if we have one filename per line, then that means that filenames must not contain the newline character. We also run into the problem that different platforms use different ascii characters to denote a newline. To process arbitrary filenames it would make more sense to separate them with a zero byte. What do you think? --- *By Falko Fleßner on 2021-04-21T10:13:07.667Z* --- For our usecase, NULL-Terminated strings would be feasible and would also integrate quite well with the "find -print0 ..." command in other usecases. (Had to look up if CR/LF & friends are legal filename characters. Turns out: they are /o\\) I ran into bugs with my patch and found code that I would like to see cleaner. I can provide you with a new patch that splits on NULL-Bytes and solves my other bugs & issues, too.