We have various mechanisms that aim to ensure that the installed provider
plugins are consistent with the lock file and that the lock file is
consistent with the provider requirements, and we do have existing unit
tests for them, but all of those cases mock our fake out at least part of
the process and in the past that's caused us to miss usability
regressions, where we still catch the error but do so at the wrong layer
and thus generate error message lacking useful additional context.
Here we'll add some new end-to-end tests to supplement the existing unit
tests, making sure things work as expected when we assemble the system
together as we would in a release. These tests cover a number of different
ways in which the plugin selections can grow inconsistent.
These new tests all run only when we're in a context where we're allowed
to access the network, because they exercise the real plugin installer
codepath. We could technically build this to use a local filesystem mirror
or other such override to avoid that, but the point here is to make sure
we see the expected behavior in the main case, and so it's worth the
small additional cost of downloading the null provider from the real
registry.
In the original incarnation of Meta.providerFactories we were returning
into a Meta.contextOpts whose signature didn't allow it to return an
error directly, and so we had compromised by making the provider factory
functions themselves return errors once called.
Subsequent work made Meta.contextOpts need to return an error anyway, but
at the time we neglected to update our handling of the providerFactories
result, having it still defer the error handling until we finally
instantiate a provider.
Although that did ultimately get the expected result anyway, the error
ended up being reported from deep in the guts of a Terraform Core graph
walk, in whichever concurrently-visited graph node happened to try to
instantiate the plugin first. This meant that the exact phrasing of the
error message would vary between runs and the reporting codepath didn't
have enough context to given an actionable suggestion on how to proceed.
In this commit we make Meta.contextOpts pass through directly any error
that Meta.providerFactories produces, and then make Meta.providerFactories
produce a special error type so that Meta.Backend can ultimately return
a user-friendly diagnostic message containing a specific suggestion to
run "terraform init", along with a short explanation of what a provider
plugin is.
The reliance here on an implied contract between two functions that are
not directly connected in the callstack is non-ideal, and so hopefully
we'll revisit this further in future work on the overall architecture of
the CLI layer. To try to make this robust in the meantime though, I wrote
it to use the errors.As function to potentially unwrap a wrapped version
of our special error type, in case one of the intervening layers is
changed at some point to wrap the downstream error before returning it.
The codepath for AllAttributesNull was not correct for any nested object
types with collections, and should create single null values for the
correct NestingMode rather than a single object with null attributes.
Since there is no reason to descend into nested object types to create
nullv alues, we can drop the AllAttributesNull function altogether and
create null values as needed during ProposedNew.
The corresponding AllBlockAttributesNull was only called internally in 1
location, and simply delegated to schema.EmptyValue. We can reduce the
package surface area by dropping that function too and calling
EmptyValue directly.
In historical versions of Terraform the responsibility to check this was
inside the terraform.NewContext function, along with various other
assorted concerns that made that function particularly complicated.
More recently, we reduced the responsibility of the "terraform" package
only to instantiating particular named plugins, assuming that its caller
is responsible for selecting appropriate versions of any providers that
_are_ external. However, until this commit we were just assuming that
"terraform init" had correctly selected appropriate plugins and recorded
them in the lock file, and so nothing was dealing with the problem of
ensuring that there haven't been any changes to the lock file or config
since the most recent "terraform init" which would cause us to need to
re-evaluate those decisions.
Part of the game here is to slightly extend the role of the dependency
locks object to also carry information about a subset of provider
addresses whose lock entries we're intentionally disregarding as part of
the various little edge-case features we have for overridding providers:
dev_overrides, "unmanaged providers", and the testing overrides in our
own unit tests. This is an in-memory-only annotation, never included in
the serialized plan files on disk.
I had originally intended to create a new package to encapsulate all of
this plugin-selection logic, including both the version constraint
checking here and also the handling of the provider factory functions, but
as an interim step I've just made version constraint consistency checks
the responsibility of the backend/local package, which means that we'll
always catch problems as part of preparing for local operations, while
not imposing these additional checks on commands that _don't_ run local
operations, such as "terraform apply" when in remote operations mode.
We recently removed the legacy way we used to track the SHA256 hashes of
individual provider executables as part of a plans.Plan, because these
days we want to track the checksums of entire provider packages rather
than just the executable.
In order to achieve that new goal, we can save a copy of the dependency
lock information inside the plan file. This follows our existing precedent
of using exactly the same serialization formats we'd normally use for
this information, and thus we can reuse the existing models and
serializers and be confident we won't lose any detail in the round-trip.
As of this commit there's not yet anything actually making use of this
mechanism. In a subsequent commit we'll teach the main callers that write
and read plan files to include and expect (respectively) dependency
information, verifying that the available providers still match by the
time we're applying the plan.
Previously the planfile.Create function had accumulated probably already
too many positional arguments, and I'm intending to add another one in
a subsequent commit and so this is preparation to make the callsites more
readable (subjectively) and make it clearer how we can extend this
function's arguments to include further components in a plan file.
There's no difference in observable functionality here. This is just
passing the same set of arguments in a slightly different way.
Historically the responsibility for making sure that all of the available
providers are of suitable versions and match the appropriate checksums has
been split rather inexplicably over multiple different layers, with some
of the checks happening as late as creating a terraform.Context.
We're gradually iterating towards making that all be handled in one place,
but in this step we're just cleaning up some old remnants from the
main "terraform" package, which is now no longer responsible for any
version or checksum verification and instead just assumes it's been
provided with suitable factory functions by its caller.
We do still have a pre-check here to make sure that we at least have a
factory function for each plugin the configuration seems to depend on,
because if we don't do that up front then it ends up getting caught
instead deep inside the Terraform runtime, often inside a concurrent
graph walk and thus it's not deterministic which codepath will happen to
catch it on a particular run.
As of this commit, this actually does leave some holes in our checks: the
command package is using the dependency lock file to make sure we have
exactly the provider packages we expect (exact versions and checksums),
which is the most crucial part, but we don't yet have any spot where
we make sure that the lock file is consistent with the current
configuration, and we are no longer preserving the provider checksums as
part of a saved plan.
Both of those will come in subsequent commits. While it's unusual to have
a series of commits that briefly subtracts functionality and then adds
back in equivalent functionality later, the lock file checking is the only
part that's crucial for security reasons, with everything else mainly just
being to give better feedback when folks seem to be using Terraform
incorrectly. The other bits are therefore mostly cosmetic and okay to be
absent briefly as we work towards a better design that is clearer about
where that responsibility belongs.
Only depends_on ancestors for transitive dependencies when we're not
pointed directly at a resource. We can't be much more precise here,
since in order to maintain our guarantee that data sources will wait for
explicit dependencies, if those dependencies happen to be a module,
output, or variable, we have to find some upstream managed resource in
order to check for a planned change.
We must ensure that the terraform required_version is checked as early
as possible, so that new configuration constructs don't cause init to
fail without indicating the version is incompatible.
The loadConfig call before the earlyconfig parsing seems to be unneeded,
and we can delay that to de-tangle it from installing the modules which
may have their own constraints.
TODO: it seems that loadConfig should be able to handle returning the
version constraints in the same manner as loadSingleModule.
Our current implementation of destroy planning includes secretly running a
normal plan first, in order to get its effect of refreshing the state.
Previously our warning about colliding moves would betray that
implementation detail because we'd return it from both of our planning
operations here and thus show the message twice. That would also have
happened in theory for any other warnings emitted by both plan operations,
but it's the move collision warning that made it immediately visible.
We'll now only return warnings from the initial plan if we're also
returning errors from that plan, and thus the warnings of both plans can
never mix together into the same diags and thus we'll avoid duplicating
any warnings.
This does mean that we'd lose any warnings which might hypothetically
emerge only from the hidden normal plan and not from the subsequent
destroy plan, but we'll accept that as an okay tradeoff here because those
warnings are likely to not be super relevant to the destroy case anyway,
or else we'd emit them from the destroy-plan walk too.
For now, only check the JSON views package, since this was the instance
that most recently tripped us up. There are a few dozen failures
elsewhere in Terraform which would need to be addressed before expanding
this to other packages.
The extra feedback information for why resource instance deletion is
planned is now included in the streaming JSON UI output.
We also add an explicit case for no-op actions to switch statements in
this package to ensure exhaustiveness, for future linting.
The previous conservative guarantee that we would not make backwards
incompatible changes to the state file format until at least Terraform
1.1 can now be extended. Terraform 0.14 through 1.1 will be able to
interoperably use state files, so we can update the remote backend
version compatibility check accordingly.
Because our validation rules depend on some dynamic results produced by
actually running the plan, we deal with moves in a "backwards" order where
we try to apply them first -- ignoring anything strange we might find --
and then validate the original statements only after planning.
An unfortunate consequence of that approach is that when the move
statements are invalid it's likely that move execution will not fully
complete, and so the generated plan is likely to be incorrect and might
well include errors resulting from the unresolved moves.
To mitigate that, here we let any move validation errors supersede all
other diagnostics that the plan phase might've generated, in the hope that
it'll help the user focus on fixing the incorrect move statements without
creating confusing by reporting errors that only appeared as a quick of
how Terraform worked around the invalid move statements earlier.
In most cases Terraform will be able to automatically fully resolve all
of the pending move statements before creating a plan, but there are some
edge cases where we can end up wanting to move one object to a location
where another object is already declared.
One relatively-obvious example is if someone uses "terraform state mv" in
order to create a set of resource instance bindings that could never have
arising in normal Terraform use.
A less obvious example arises from the interactions between moves at
different levels of granularity. If we are both moving a module to a new
address and moving a resource into an instance of the new module at the
same time, the old module might well have already had a resource of the
same name and so the resource move will be unresolvable.
In these situations Terraform will move the objects as far as possible,
but because it's never valid for a move "from" address to still be
declared in the configuration Terraform will inevitably always plan to
destroy the objects that didn't find a final home. To give some additional
explanation for that result, here we'll add a warning which describes
what happened.
This is not a particularly actionable warning because we don't really
have enough information to guess what the user intended, but we do at
least prompt that they might be able to use the "terraform state" family
of subcommands to repair the ambiguous situation before planning, if they
want a different result than what Terraform proposed.