img2pdf python3.6 MemoryError #64

New issue

Closed

opened 2021-04-25 19:58:31 +00:00 by josch · 0 comments

josch commented

2021-04-25 19:58:31 +00:00

Owner

By LoveEatCandy on 2019-08-26T05:54:07.557Z

How can I turn a lot of image to one pdf file?

def convert_img_list_to_pdf(image_list, pdf_path):
import img2pdf
with open(pdf_path, "wb") as f:
img2pdf.convert(image_list, outputstream=f)

By josch on 2019-08-26T05:59:14.205Z

You have to unpack the list first. Try img2pdf.convert(*image_list, outputstream=f)

By LoveEatCandy on 2019-08-26T06:16:50.810Z

@josch Sorry, it didn't work.
I think the problem is : Must read all image data in memory before write image data in pdf file, and my image list is too large.
Is there any way that can write image data in pdf file one by one.

By josch on 2019-08-26T06:23:13.178Z

Please articulate your Problem properly.

In the subject you write something about a memory error but in your issue report you only ask how to turn a lot of images into one pdf file. Where is the memory error?

No, img2pdf does not read all image data into memory before reading it. Please show me the error message you, give me a way to reproduce the problem and list down exactly how you already attempted to diagnose it.

I don't have a crystal ball, so just saying "it's broken" doesn't let me know what exactly is broken and how.

If your problem really is, that the resulting PDF does not fit into memory, then you could convert each page by itself and then concatenate each page with each other later. But I'm afraid if that is your problem, then you will fail yet again because to create a PDF you have to be able to hold the whole PDF in memory at once. This is because of the xref table at the end which references every object in the PDF.

By LoveEatCandy on 2019-08-26T06:59:05.703Z

@josch Sorry,my English is poor.

`img2pdf.convert(*images_path_list, outputstream=f)`

Traceback (most recent call last): File "/home/www-data/web2py_prev/gluon/scheduler.py", line 507, in executor result = dumps(_function(*args, **vars)) File "/home/www-data/web2py_prev/applications/sai_admin/modules/system_scheduler.py", line 31, in do_task_on_message module_service.on_message(service_id, new_context) File "/home/www-data/web2py_prev/applications/sai_admin/modules/system_export.py", line 280, in on_message self.update_export_award_pdf(table_id, module_name, ids) File "/home/www-data/web2py_prev/applications/sai_admin/modules/system_export.py", line 218, in update_export_award_pdf module_service.pack_award_to_pdf(ids, pdf_file_path) File "/home/www-data/web2py_prev/applications/sai_admin/modules/match/player_award.py", line 338, in pack_award_to_pdf convert_img_list_to_pdf(award_pic_iter, pdf_path) File "/home/www-data/web2py_prev/applications/sai_admin/modules/api/file_api.py", line 154, in convert_img_list_to_pdf img2pdf.convert(*image_list, outputstream=f) File "/usr/local/lib/python3.6/dist-packages/img2pdf.py", line 1829, in convert ) in read_images(rawdata, kwargs["colorspace"], kwargs["first_frame_only"]): File "/usr/local/lib/python3.6/dist-packages/img2pdf.py", line 1161, in read_images im = BytesIO(rawdata) TypeError: a bytes-like object is required, not 'str'

for img in images: # img is allowed to be a path, a binary string representing image data # or a file-like object (really anything that implements read()) try: rawdata = img.read() except AttributeError: if not isinstance(img, (str, bytes)): raise TypeError("Neither implements read() nor is str or bytes") # the thing doesn't have a read() function, so try if we can treat # it as a file name try: with open(img, "rb") as f: rawdata = f.read() except Exception: # whatever the exception is (string could contain NUL # characters or the path could just not exist) it's not a file # name so we now try treating it as raw image content rawdata = img

When reading too many images, "rawdata = f.read()" will raise memory error, then run "rawdata = img" and now img is a image path(type str), so I got the error.

By josch on 2019-08-26T07:12:25.667Z

You are right, that is a bug. If f.read() raises a memory error, then the data is treated as raw image content which is wrong in your case. Instead, the memory error should be caught and handled appropriately.

How large is the data you are trying to process?

By LoveEatCandy on 2019-08-26T07:29:25.352Z

@josch About 1Gb. I can upgrade the memory temporarily, but the image count are increasing. So I want to know, can I create pdf file like this(to write image data one by one):

def convert_img_list_to_pdf(image_path_list, pdf_path): import img2pdf with open(pdf_path, "wb") as f: for single_image_path in image_path_list: f.write(img2pdf.convert(single_image_path))
The other way for me is to create multiple pdf files.

By josch on 2019-08-26T07:35:03.817Z

No, if you do it like that, then you are just concatenating multiple PDF files together and the PDF format cannot be concatenated.

If you are processing 1 GB of data, then the resulting PDF will also be around 1 GB. Do you have a PDF reader that is able to handle documents of that size? How is this practical?

By LoveEatCandy on 2019-08-26T07:43:56.567Z

It's a good problem I shoud consider.
Looks the best way is to create 5 pdf files , each is 200MB. And zip them for download.
Thanks.

By LoveEatCandy on 2019-08-26T07:43:56.922Z

Status changed to closed

*By LoveEatCandy on 2019-08-26T05:54:07.557Z* How can I turn a lot of image to one pdf file? def convert_img_list_to_pdf(image_list, pdf_path): import img2pdf with open(pdf_path, "wb") as f: img2pdf.convert(image_list, outputstream=f) --- *By josch on 2019-08-26T05:59:14.205Z* --- You have to unpack the list first. Try `img2pdf.convert(*image_list, outputstream=f)` --- *By LoveEatCandy on 2019-08-26T06:16:50.810Z* --- @josch Sorry, it didn't work. I think the problem is : Must read all image data in memory before write image data in pdf file, and my image list is too large. Is there any way that can write image data in pdf file one by one. --- *By josch on 2019-08-26T06:23:13.178Z* --- Please articulate your Problem properly. In the subject you write something about a memory error but in your issue report you only ask how to turn a lot of images into one pdf file. Where is the memory error? No, img2pdf does *not* read all image data into memory before reading it. Please show me the error message you, give me a way to reproduce the problem and list down exactly how you already attempted to diagnose it. I don't have a crystal ball, so just saying "it's broken" doesn't let me know what exactly is broken and how. If your problem really is, that the resulting PDF does not fit into memory, then you could convert each page by itself and then concatenate each page with each other later. But I'm afraid if that is your problem, then you will fail yet again because to create a PDF you have to be able to hold the whole PDF in memory at once. This is because of the xref table at the end which references every object in the PDF. --- *By LoveEatCandy on 2019-08-26T06:59:05.703Z* --- @josch Sorry,my English is poor. `img2pdf.convert(*images_path_list, outputstream=f)` -------------------------------------- `Traceback (most recent call last): File "/home/www-data/web2py_prev/gluon/scheduler.py", line 507, in executor result = dumps(_function(*args, **vars)) File "/home/www-data/web2py_prev/applications/sai_admin/modules/system_scheduler.py", line 31, in do_task_on_message module_service.on_message(service_id, new_context) File "/home/www-data/web2py_prev/applications/sai_admin/modules/system_export.py", line 280, in on_message self.update_export_award_pdf(table_id, module_name, ids) File "/home/www-data/web2py_prev/applications/sai_admin/modules/system_export.py", line 218, in update_export_award_pdf module_service.pack_award_to_pdf(ids, pdf_file_path) File "/home/www-data/web2py_prev/applications/sai_admin/modules/match/player_award.py", line 338, in pack_award_to_pdf convert_img_list_to_pdf(award_pic_iter, pdf_path) File "/home/www-data/web2py_prev/applications/sai_admin/modules/api/file_api.py", line 154, in convert_img_list_to_pdf img2pdf.convert(*image_list, outputstream=f) File "/usr/local/lib/python3.6/dist-packages/img2pdf.py", line 1829, in convert ) in read_images(rawdata, kwargs["colorspace"], kwargs["first_frame_only"]): File "/usr/local/lib/python3.6/dist-packages/img2pdf.py", line 1161, in read_images im = BytesIO(rawdata) TypeError: a bytes-like object is required, not 'str'` -------------------------------------------- ` for img in images: # img is allowed to be a path, a binary string representing image data # or a file-like object (really anything that implements read()) try: rawdata = img.read() except AttributeError: if not isinstance(img, (str, bytes)): raise TypeError("Neither implements read() nor is str or bytes") # the thing doesn't have a read() function, so try if we can treat # it as a file name try: with open(img, "rb") as f: rawdata = f.read() except Exception: # whatever the exception is (string could contain NUL # characters or the path could just not exist) it's not a file # name so we now try treating it as raw image content rawdata = img` When reading too many images, "rawdata = f.read()" will raise memory error, then run "rawdata = img" and now img is a image path(type str), so I got the error. --- *By josch on 2019-08-26T07:12:25.667Z* --- You are right, that is a bug. If `f.read()` raises a memory error, then the data is treated as raw image content which is wrong in your case. Instead, the memory error should be caught and handled appropriately. How large is the data you are trying to process? --- *By LoveEatCandy on 2019-08-26T07:29:25.352Z* --- @josch About 1Gb. I can upgrade the memory temporarily, but the image count are increasing. So I want to know, can I create pdf file like this(to write image data one by one): `def convert_img_list_to_pdf(image_path_list, pdf_path): import img2pdf with open(pdf_path, "wb") as f: for single_image_path in image_path_list: f.write(img2pdf.convert(single_image_path)) ` The other way for me is to create multiple pdf files. --- *By josch on 2019-08-26T07:35:03.817Z* --- No, if you do it like that, then you are just concatenating multiple PDF files together and the PDF format cannot be concatenated. If you are processing 1 GB of data, then the resulting PDF will also be around 1 GB. Do you have a PDF reader that is able to handle documents of that size? How is this practical? --- *By LoveEatCandy on 2019-08-26T07:43:56.567Z* --- It's a good problem I shoud consider. Looks the best way is to create 5 pdf files , each is 200MB. And zip them for download. Thanks. --- *By LoveEatCandy on 2019-08-26T07:43:56.922Z* --- Status changed to closed