Binary packages express binary dependencies on other binary packages. Source packages express build dependencies on binary packages. Both dependency relationships are expressed in conjunctive normal form. A conjunctive normal form is a conjunction of disjunctions. A conjunction is an AND-relationship while a disjunction is an OR-relationship. Here an example:
Depends: blub, foo, bar | baz
The dependencies blub, foo and bar | baz form a conjunction because to fulfill this dependency relationship they must all be satisfied. The clauses are connected by multiple ANDs. The clause bar | baz is a disjunction of bar and baz because to fulfill this clause it is enough if only one of them is satisfied. The packages bar and baz are connected by an OR.
The other type of dependency is the builds-from relationship which is a dependency from a binary package to the source package that builds it. This dependency exists because a binary package can only exist if its source package can be built.
build architecture the architecture the package is built on
host architecture the architecture the package is built for
target architecture the architecture the bootstrap is done for. During cross compilation the host architecture is the target architecture.
Dependency Closure and Installation Sets
A dependency closure is the set of binary packages closed under their dependency relationship. This is, if all dependency relationships from a starting binary or source package were followed (including all disjunctions) recursively, then the resulting set of visited binary packages would form the dependency closure of the starting binary or source package. The dependency closure of a binary or source package therefore contains all binary packages that this binary or source package directly or indirectly could possibly require to satisfy its runtime or build dependencies respectively.
Since disjunctive dependencies are also followed, a dependency closure often contains more binary packages that are actually necessary to fulfill a given binary or source package's dependencies. It is also possible for two packages in a dependency closure to conflict with each other. An installation set is a subset of a dependency closure. An installation set must fulfill the following properties:
Every package in the installation set has its dependencies satisfied
No two packages in the installation set conflict with each other
Usually there exist different choices of installation sets and choosing one is the task of a solver and its given metrics. The set of packages which satisfies the dependencies for installing or building more than one binary or source package is called a co-installation set. The same conditions as for normal installation sets hold. Some sets of binary or source packages might not have a co-installation set as they themselves or their respective dependencies conflict with each other.
The set of strong dependencies of a binary or source package is the intersection of all its possible installation sets. In other words: a binary package is a strong dependency of another binary (or source) package, if the latter cannot be installed (or compiled) without the former. The strong dependency set therefore creates a lower bound and at the same time a strict minimum requirement for installation or build dependency satisfaction of any binary or source package.
Repositories and Metadata Repositories
A repository is a set of binary packages and source packages including their metadata information and the actual binaries and source code they contain, respectively. Package managers have access to repositories to facilitate the retrieval and installation of new or upgraded packages on the system of the end user. Repositories are usually either accessed over HTTP or FTP or can be stored on a local filesystem.
The dependency analysis of botch is carried out on the metadata stored inside a repository. We call a repository that only consists of metadata like package names, versions and their relationships but without the binary content or source code a metadata repository. The metadata information for binary and source packages is stored in Packages and Sources files respectively. Those two files conform to the control file format. To start a bootstrap analysis with botch the only needed input are a pair of those Packages and Sources files for the desired distribution and target architecture.
Most repositories offered by binary distributions are self-contained. A self-contained repository is a set of binary packages B and a set of source packages S for which the following holds:
All binary packages B (except Architecture:all binary packages) must be compilable from the source packages in S.
All source packages in S must be compilable from the binary packages in B.
The exception for Architecture:all packages is made because in a bootstrap situation, those binary packages do not need to be compiled anymore. It is therefore not necessary to make source packages that build Architecture:all binary packages part of the problem.
Essential:yes binary packages are an implicit dependency of every binary package and the build-essential package is an implicit build dependency of all source packages. Therefore all Essential:yes binary packages and the build-essential binary package plus all binary dependencies of both are always part of a self-contained repository. Same holds for the source packages that build those binary packages except for those binary packages that are Architecture:all for reasons explained earlier.
Throughout the bootstrap process we associate the boolean property of availability to packages. Availability means that the package including all its data files exists for a given architecture. This property is particularly interesting for binary packages as source packages are always available for all architectures. Additionally, all Architecture:all binary packages are always available for any architecture as they are architecture independent. To make an architecture dependent binary package available it must be compiled natively or cross compiled. It is the goal of the bootstrap process to make all architecture dependent binary packages of the target architecture available.
This property is not to be confused with the availability of only the metadata of a given binary package for an architecture. Meta data information and full metadata repositories can always be generated for all binary packages for all architectures. The set of available binary packages refers to an actual repository of binary packages with their binary and data content available in addition to their metadata.
Having full metadata information available in form of metadata repositories before the actual binary packages associated with it are available is also a necessary property to analyze the dependency relationships between binary and source packages for the target architecture. For example, calculating an installation set for a given binary or source package depends on this metadata information.
The goal of the bootstrap process is to make more and more binary packages available through cross or native compilation until all architecture dependent binary packages are available for the target architecture and all source packages can be compiled on the target architecture without running into dependency cycles. The bootstrapping process starts with no binary packages of the target architecture being available. Some source packages have to be cross compiled so that enough binary packages for a minimal build system become available. The rest of the binary packages is then made available through native compilation.
Installability means that a package can be installed and more precisely, that there exists an installation set of this binary package for which all its members are available for a given architecture. Installability is therefore stronger than availability and installability of a package also implies its availability.
We sometimes use the term installability for source packages. For a source package to be installable it means that all binary packages of one of its installation sets are available. Installability for source packages is therefore equal to its compilability.
In a healthy distribution all binary packages are available and installable and all source packages are compilable. During bootstrapping only a subset of all binary packages are available and therefore even fewer are installable.
Build Dependency Cycles
During the bootstrapping of a distribution, only a small subset of all binary packages is initially made available through cross compilation. To make more binary packages available their respective source packages have to be compiled. But those source packages might build depend on binary packages that are not yet available themselves and whose source packages can't be built for the same reason. Build dependency cycles are created.
Build dependency cycles do not only exist during native compilation but also during cross compilation of the minimal build system. In that scenario, build dependency cycles are created because not all of the cross build dependencies of source packages can be satisfied by packages of the build architecture but require not-yet-built binary packages of the host architecture. But since in most source packages at least some cross build dependencies can be satisfied by packages of the build architecture, there exist much fewer build dependency cycles during cross compilation than during native compilation. It has been shown that in practice, a minimal build system can be cross compiled by only modifying a dozen source packages.
Strongly Connected Component
While dependency cycles are an intuitve way to talk about the problem, a dependency graph does usually never contain individual cycles. Instead, a subset of vertices in the graph are all involved in cyclic dependencies whith each other. This subset of vertices is called a strongly connected component or SCC. All vertices of a strongly connected component are in at least one dependency cycle with every other vertex of the SCC. In Debian, there exists one big strongly connected component of several hundred vertices which contains all of Debian's core packages as well as browsers, window managers, compilers and most of qt and gtk. The other strongly connected components in the dependency graph are no bigger than 10 vertices and mostly involve compilers that need themselves to be built. The main strongly connected component in Debian contains more than several million dependency cycles.
Build Graph and Source Graph
Botch handles two different graph types which we called build graph and source graph. A build graph consists of two types of vertices:
InstSet vertices represent one binary package and one of its installation sets
SrcPkg vertices represent one source package
InstSet vertices are connected to SrcPkg vertices to show a "builds-from" relationship for all the binary packages that are port of the installation set.
SrcPkg vertices are connected to InstSet vertices to show a "build-depends" relationship for all the build dependencies of the source package.
Vertices of the same type cannot connect to each other.
A build graph can be converted into a source graph. A source graph only contains SrcPkg vertices. InstSet vertices are removed through path contraction over all pairs of SrcPkg vertices that are connected by a single InstSet vertex. Therefore, a source graph keeps the ordering relationship of the build graph. A source graph cannot be turned back into a build graph. A source graph can be annotated with strong dependency information. This information is not useful in the build graph.