Files from host end up in output tarball #26
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Using the latest
mmdebstrap
with mode unshare as a non root user with the output as a tarball I observed two files in the final tarball:These files are copies of files from the host and therfore make the mmdebstrap command not reproducible when run on different hosts.
To observe run:
On two different hosts. The checksums of the tarballs will be different. Using
diffoscope
to diff the tarballs the only differences between them are the two files listed above.Can
mmdebstrap
remove the files copied in?Your observation is absolutely correct. The reason why /etc/resolv.conf and /etc/hostname is copied in, is because debootstrap is doing the same and thus I think that's what users expect. You are correct that this results in different tarballs when being run on different systems. The scenarios I see are these:
either you run mmdebstrap casually as a normal user on your own computer to create throwaway chroots or build chroots or the like -- in that case it's okay that the output is not the same as on another computer because you never compare your own chroot with those created on another host
or you run mmdebstrap as part of a bigger project like a script that builds a bootable system image and those are supposed to be bit-by-bit reproducible no matter who runs the script. In that case, the script that runs mmdebstrap can easily add
--customize-hook 'rm "$1"/etc/resolv.conf'
to the mmdebstrap invocation.Do you see another scenario that I haven't thought of?
I do not see another case, I filed this bug about scenario 2.
I don't think there needs to be a code fix, I think updating the manpage to cover this scenario would be perfect.
Currently I see on the manpage, I see two confusing statements:
Perhaps adequate resolution would be to update some of the manpage to state that reproducibility is in the scenario of running the command on the same host and not across hosts.
Also documenting that these files are copied in and would affect reproducibility would be great.
Ah okay, I understand. Yes, I'll add some more docs to explain this. Thanks for pointing this out!
@josch could you re-think that, maybe implement it under a flag (like --clean-host-files). My use case for mmdebstrap is for using it for installing packages in "Distroless" Docker images, which are based on Debian (ATM Bullseye).
Currently, I'm need to do something like this...
Hi @markkrj, thanks for your input! Just to confirm, you are not calling mmdebstrap manually in a real terminal but you are calling it as part of a Dockerfile, correct?
Hi @markkrj, my argument is: if you need the functionality of a hypothetical
--clean-host-files
option in a situation where you call mmdebstrap from a script, then you don't need that option because you can just use hooks to do whatever you need to accomplish. Agreed?Yes.
@josch Ye, we can circumvent it by script, but then we end up cluttering the script/Dockerfile with otherwise unneeded things. I'll understand if you keep this as won't fix, just wanted to show another use case for removing host files from chroot. Maybe you could remove host files just for "extract" variant, as I don't see a reason for extract having it also.
Hi @markkrj, thank you for your update. Yes, adding hooks to your Dockerfile is additional cost on your end. The problem is, that by adding a
--clean-host-files
option you are shifting the cost from your end to every mmdebstrap user because every additional option we add means that we must not only document it and thus increase the already very large manual page, making it more confusing. It also means that we are in the situation that we have to define what "host files" actually means. What about people who want to delete some but not the other. Now we have an option that can take multiple arguments, making it even more complex. And what if somebody wants to delete these files only in certain circumstances? It probably makes sense to only delete them ifSOURCE_DATE_EPOCH
is set because otherwise the output wouldn't be reproducible anyways. But what if somebody comes along with a use-case that would make it useful even withoutSOURCE_DATE_EPOCH
? One needs to be very careful when adding new command line options because those options are an interface and you essentially can never remove or change them after you introduced them because then you will break other people's code. Options with a use-case as small as--clean-host-file
remind me of this:Or of man pages like this one: https://manpages.debian.org/unstable/parallel/parallel.1.en.html
Creating a chroot has a lot of moving parts and everybody has some very specific requirements for their chroot creation tool. The advantage for mmdebstrap is, that most people with very specific requirements are running it from their own set of scripts and this means that instead of herding a large collection of very specific command line options, we can just provide them a hook mechanism that allows them to do whatever their specific use-case requires.
So in essence, adding new command line options is not free because every new option adds a cost for every user of mmdebstrap including the long-term maintenance of the option. I'll only add new options as a short-hand for something that can be done with hooks if it is clear that the option will be used by people running mmdebstrap from the CLI and thus there is a requirement for something short to avoid a lot of typing.
This is not the case for you. What I can offer you though is for mmdebstrap to ship a hook script that specifically deletes all files from the chroot that were copied in from the host, essentially containing a single line:
Then you could run mmdebstrap with:
But oh, this line is even longer than just the manual
rm
and from the code you posted above I see that this is not quite what you are needing which precisely proves my point. It is very hard to come up a mini-option like--clean-host-files
that does exactly what the user wants. From the code you posted it looks to me that even if there was a--clean-host-files
option, it wouldn't exactly do what you want and you wouldn't use it in the end anyways.