It is common for the same module source package to be referenced multiple
times in the same configuration, either because there are literally
multiple instances of the same module source or because a single package
(or repository) contains multiple modules in sub-directories and many
of them are referenced.
To optimize this, here we introduce a simple caching behavior where the
module installer will detect if it's asked to install multiple times from
the same source and produce the second and subsequent directories by
copying the first, rather than by downloading again over the network.
This optimization is applied once all of the go-getter detection has
completed and sub-directory portions have been trimmed, so it is also
able to normalize differently-specified source addresses that all
ultimately detect to the same resolved address. When installing, we
always extract the entire specified package (or repository) and then
reference the specified sub-directory, so we can safely re-use existing
directories when the base package is the same, even if the sub-directory
is different.
However, as a result we do not yet address the fact that the same package
will be stored multiple times _on disk_, which may still be problematic
when referencing large repositories multiple times in
disk-storage-constrained environments. We could address this in a
subsequent change by investigating the use of symlinks where possible.
Since the Registry installer is implemented just as an extra resolution
step in front of go-getter, this optimization applies to registry
modules too. This does not apply to local relative references, which will
continue to just resolve into the already-prepared directory of their
parent module.
The cache of previously installed paths lives only for the duration of
one call to InstallModules, so we will never re-use directories that
were created by previous runs of "terraform init" and there is no risk
that older versions will pollute the cache when attempting an upgrade
from a source address that doesn't explicitly specify a version.
No additional tests are added here because the existing module installer
tests (when TF_ACC=1) already cover the case of installing multiple
modules from the same source.
Here we introduce a new idea of a "configuration snapshot", which is an
in-memory copy of the source code of each of the files that make up
the configuration. The primary intended purpose for this is as an
intermediate step before writing the configuration files into a plan file,
and then reading them out when that plan file is later applied.
During earlier configs package development we expected to use an afero vfs
implementation to read directly from the zip file, but that doesn't work
in practice because we need to preserve module paths from the source file
system that might include parent directory traversals (../) while
retaining the original path for use in error messages.
The result, for now, is a bit of an abstraction inversion: we implement
a specialized afero vfs implementation that makes the sparse filesystem
representation from a snapshot appear like a normal filesystem just well
enough that the config loader and parser can work with it.
In future we may wish to rework the internals here so that the main
abstraction is at a similar level to the snapshot and then that API is
mapped to the native filesystem in the normal case, removing afero. For
now though, this approach avoids the need for a significant redesign
of the parser/loader internals, at the expense of some trickiness in the
case where we're reading from a snapshot.
This commit does not yet include the reading and writing of snapshots into
plan files. That will follow in a subsequent commit.
addrs.Module is itself internally just []string, but this better
communicates our intent here and makes this integrate better with other
code which is using this type for this purposes.
These utility functions are intended to allow concisely loading a
configuration from a fixture directory in a test, bailing out early if
there are any unexpected errors.
By adding this method you now only have to pass a `*disco.Disco` object around in order to do discovery and use any configured credentials for the discovered hosts.
Of course you can also still pass around both a `*disco.Disco` and a `auth.CredentialsSource` object if there is a need or a reason for that!
Previously we were just loading the module and asserting no diagnostics,
but that is not really good enough since if we install modules incorrectly
it's possible that we are still able to load an empty configuration
successfully.
Now we'll do some basic inspecetion of the module tree that results from
loading what we installed, to ensure that all of the expected modules
are present at the right locations in the tree.
This will provide the functionality of "terraform init -from-module=...",
which uses the contents of a given module to populate the working
directory.
This mechanism is intended for installing e.g. examples from Terraform
Registry or elsewhere. It's not fully-general since it can't reasonably
install a module from a subdir that refers up to a parent directory, but
that isn't an issue for all reasonable uses of this option.
Originally the hope was to use the afero filesystem abstraction for all
loader operations, but since we install modules using go-getter we cannot
(without a lot of refactoring) support vfs for installation.
The vfs use-case is for reading configuration from plan zip files anyway,
and so we have no real reason to support installation into a vfs. For now
at least we will just add the possibility that a loader might not be
install-capable. At the moment we have no non-install-capable loaders, but
we'll add one later once we get to loading configuration from plan files.
Unlike the old installer in config/module, this uses new-style
installation directories that include the static module path so that paths
we show in diagnostics will be more meaningful to the user.
As before, we retrieve the entire "package" associated with the given
source string, rather than any given subdirectory directly, because the
retrieved module may contain ../ references into parent directories which
must be resolvable after extraction.
Enough of the InstallModules method to install local modules (those with
relative paths). "Install" is actually a bit of an exaggeration for these
since we actually just record them in our manifest after verifying that
the source directory exists.
This is a change of behavior relative to the old module installer since
we no longer create a symlink to the module directory inside the
.terraform/modules directory. Instead, we record the module's true
location in our manifest so that the loader will find it later.
The use of a symlink here predated the manifest file. Now that we have a
manifest file the symlinks are redundant. Using the "natural" location of
the module leads to more helpful error messages, since we'll refer to
the module path as the user expects it, rather than to an internal alias.
Previously the behavior for loading and installing modules was included in
the same package as the representation of the module tree (in the
config/module package).
In our new world, the model of a module tree (now called a "Config") is
included in "configs" along with the Module and File structs. This new
package replaces the loading and installation functionality previously
in config/module with new equivalents that work with the model objects
in "configs".
As of this commit, only the loading functionality is implemented. The
installation functionality will follow in subsequent commits.