We previously deferred this to Validate, but all of our operations require
a valid set of variables and so checking this up front makes it more
obvious when a call into Terraform Core from the CLI layer isn't
populating variables correctly, instead of having it fail downstream
somewhere.
It's the caller's responsibility to ensure that it has collected values
for all of the variables in some way before calling NewContext, because
in the main case driven by the CLI there are many different places that
variable values can be collected from and so handling the main user-facing
validation in the CLI allows us to return better error messages that take
into account which way a variable is (or is not) being set.
In order to allow lazy evaluation of resource indexes, we can't index
resources immediately via GetResourceInstance. Change the evaluation to
always return whole Resources via GetResource, and index individual
instances during expression evaluation.
This will allow us to always check for invalid index errors rather than
returning an unknown value and ignoring it during apply.
The documentation for the -target option warns that it's intended for
exceptional circumstances only and not for routine use, but that's not a
very prominent location for that warning and so some users miss it.
Here we make the warning more prominent by including it directly in the
Terraform output when -target is in use. We first warn during planning
that the plan might be incomplete, and then warn again after apply
concludes and direct the user to run "terraform plan" to make sure that
there are no further changes outstanding. The latter message is intended
to reinforce that -target should only be a one-off operation and that you
should always run without it soon after to ensure that the workspace is
left in a consistent, converged state.
When an operation fails, providers may return a null new value rather than
returning a partial state. In that case, we'd prefer to keep the old value
so that we stand the best chance of being able to retry on a subsequent
run.
Previously we were making an exception for the delete action, allowing
the result of that to be null even when an error is returned. In practice
that was a bad idea because it would cause Terraform to lose track of the
object even though it might not actually have been deleted.
Now we'll retain the old object even in the delete case. Providers can
still return partial new objects if they were able to partially complete
a delete operation, in which case we'll discard what we had before, but
if the result is null with errors then we'll assume the delete failed
entirely and so just keep the old state as-is, giving us the opportunity
to refresh it on the next run to see if anything actually happened after
all.
(This also includes a new resource in the test provider which isn't used
by the patch but was useful for some manual UX testing here, so I thought
I'd include it in case it's similarly useful in future, given how simple
its implementation is.)
A provider may react to a create or update failing by returning error
diagnostics and a partially-updated or nil new value, in which case we
do not expect our AssertObjectCompatible consistency check to succeed: the
provider is just assumed to be doing the best it can to preserve whatever
partial outcome it was able to achieve.
However, if errors are accompanied with a nil new value after an update,
we'll assume that the provider is telling us it wasn't able to get far
enough to make any change at all, and so we'll retain the prior value in
state. This ensures that a provider can't cause an object to be forgotten
from the state just because an update failed.
Prior to Terraform 0.12 there were certain behaviors we expected from
providers that were actually just details of the SDK and not part of the
enforced contract.
For 0.12 we're now codifying some of these behaviors explicitly via safety
checks in core, thus ensuring that all future providers will behave in a
consistent way that users can rely on.
Unfortunately, due to the hand-written nature of the mock provider
implementations we use in tests, they have been getting away with some
unusual behaviors that don't match our usual expectations, and our safety
checks now detect those as incorrect behaviors.
To address this, we make the minimal changes to each test to ensure that
its mock provider behaves in a consistent way, which requires that values
set in config be represented correctly in the plan and ultimately saved
in the new state, without any changes along the way. In particular, the
common testDiffFn implementation has historically used a number of special
hidden attributes to trigger special behaviors, and our new rules require
that these special settings propagate properly through the plan and into
the state.
Previously we would allow providers to change anything about the planned
object value during apply, possibly returning an entirely-unrelated object
of the same type. In practice this led to some subtle bugs where a single
planned attribute value would change during apply and cause a downstream
failure due to a dependent resource now seeing input other than what
_it_ expected during plan.
Now we'll produce an explicit error message for this case which places the
blame with the correct party: the upstream resource that changed. Without
this, unexpected changes would often lead to the downstream resource
implementation being blamed in error message even though it was just
reacting to the change from upstream.
As with most errors during apply, we'll still save the updated value in
the state but we'll halt the walk to ensure that the unexpected value
cannot propagate further and cause the result to potentially diverge
greatly from the changeset shown in the plan.
Compared to Terraform 0.11, we expect to see this error in many of the
same cases we saw the "diffs didn't match during apply" error in earlier
versions, since it is likely that many errors of that sort were the result
of unexpected upstream changes being incorrectly blamed on the downstream
resource that then used the result.
Because Terraform Core has traditionally not checked that the final apply
result is consistent with what was planned, some of our apply tests were
producing inconsistent results.
Here we fix all of that so that they produce something compatible with
what they planned. This doesn't actually achieve anything in isolation,
but we're about to start enforcing this consistency in a subsequent
commit.
In an ideal world, providers are supposed to respond to errors during
apply by returning a partial new state alongside the error diagnostics.
In practice though, our SDK leaves the new value set to nil for certain
errors, which was causing Terraform to "forget" the object altogether by
assuming that the provider intended to say "null".
We now adjust that assumption to apply only in the delete case. In all
other cases (including updates) we retain the prior state if the new
state is given as nil. Although we could potentially fix this in the SDK
itself, I expect this is a likely bug in other future SDKs for other
languages too, so this new assumption is a safer one to make to be
resilient to data loss when providers don't behave perfectly.
Providers that return both nil new value and no errors are considered
buggy, but unfortunately that applies to the mocks in many of our tests,
so for pragmatic reasons we can't generate an error for that case as we do
for other "should never happen" situations. Instead, we'll just retain the
prior value in the state so the user can retry.
When we're being asked to destroy everything, we ideally want to end up
with a totally empty state. Normally we will conservatively keep around
the "husks" of resources (what's left after all of the instances have been
destroyed) unless they are configured without count or for_each, but in
this special case we'll prune those out.
The implication of this is that in "weird" expression contexts that happen
before the next "terraform plan", such as evaluation in
"terraform console" or expressions in data resources and provider blocks
that get evaluated during the refresh walk, we will see these results
as unknown rather than as empty lists of objects. We accept that weirdness
for now because in a future release we are likely to remove "refresh" as
a separate walk anyway, doing all of that work during the plan walk where
we can ensure that these values are properly re-populated before trying
to use them.
This test was occasionally failing due to a missing graph edge causing it
to be non-deterministic.
The graph edge was missing because our standard schema doesn't quite match
the config fixture and so the reference checker was not finding the "blah"
argument on aws_instance.a.
This change also includes an 100ms pause for the b node just to make this
potential race more likely to hit the "wrong" ordering when the graph is
not complete.
Since we started using experimental Go Modules our editor tooling hasn't
been fully functional, apparently including format-on-save support. This
is a catchup to get everything back straight again.
One of the assumptions this test was checking no longer holds: we don't
retain outputs for non-root modules in persistent state, because we can
always re-populate these on a future run by evaluating the configuration.
The old-fashioned formatting of "Provider" on the shimmed state here was
causing the state upgrade code to treat it like a module-relative address,
rather than an absolute address as we now expect.
These tests show that we're still not fully pruning modules from the state
in all cases, due to us not being able to fully prune out modules that
contain resources with count set after a destroy, but this is no worse
than before so we'll accept it for now and address this separately later.
A module heading without "<no state>" but also without any instances
listed is the rendering for a module containing a resource that has no
instances, since our old string rendering of state doesn't represent
resources themselves.
We previously had mechanisms to clean up only individual instance states,
leaving behind empty resource husks in the state after they were all
destroyed.
This takes care of it in the "orphan" case. It does not yet do it in the
"terraform destroy" or "terraform plan -destroy" cases because we don't
have anywhere to record in the plan that we're actually destroying and so
the resource configurations should be ignored and _everything_ should be
cleaned. We'll let the state be not-quite-empty in that case for now,
since it doesn't really hurt; cleaning up orphans is the main case because
the state will live on afterwards and so leftover cruft will accumulate
over the course of many changes.
Since this one has a situation where there are two deposed objects for
the same instance at once, we can't rely on comparing state strings: they
are not deterministic when multiple deposed objects are present.
Instead, we do more surgical comparisons directly on the state model
objects, which is not quite as robust but still gets us the main stuff we
care about here, to be followed up by another checkStateString further
down for the final state.
Tainted objects now also remember which provider they belong to (via the
resource state they are attached to) and so the stringified state output
here is slightly different.
This test was incorrectly amended on the first pass to create a
configuration snapshot from the step zero configuration, rather than the
step one configuration that the save plan is built from.
Along with that, it needed various other minor updates to match with
details that have shifted:
- "id" and "type" attributes must be explicitly declared in schema
- template_file.parent has count = 1, which now causes it to get an index
and be a list where before it did not.
It doesn't make sense to ignore_changes when the prior value is null,
since we have to create something before we can ignore changes to it.
This change is verified by TestContext2Apply_ignoreChangesWildcard.
This test was relying on the feature of the old provisioner API that gave
provisioners full access to the instance state of what they were
provisioning.
We no longer do this, and so instead the ApplyFn must distingish the
instances using a value from the provisioner's own configuration.
This can happen if unknown values in the plan actually end up being
identical to the prior values once resolved. In that case, we'll just make
no change at all.
This is verified by TestContext2Apply_ignoreChangesWithDep.
The prior behavior being asserted by this test was incorrect, since the
configuration calls for there to be two instances of the resource at the
end.
We also now assert on the generated plan since it's important to verify
that we are indeed planning to replace the zeroth instance but not the
first instance (which doesn't yet exist).
The testDiffFn doesn't include "compute" in the diff it produces and so
it no longer appears in the shimmed output.
This is just a quirk of this weird mock implementation; real providers
always copy all of the values from thec config into the diff before adding
in any other changes.
Now that we populate a resource-level state pre-emptively for a resource,
before we apply the instances of that resource, it is no longer true that
this produces only one resource state, but the test below (comparing the
string version of the state) already captures the expected behavior and
so this additional check was redundant anyway.