Now that variable evaluation checks for a nil expression the graph
transformer does not need to generate a synthetic expression for
variable defaults. This means that all default handling is now located
in one place, and we are not surprised by a configuration expression
showing up which doesn't actually exist in the configuration.
Rename nodeModuleVariable.evalModuleCallArgument to evalModuleVariable.
This method is no longer handling only the module call argument, it is
also dealing with the variable declaration defaults and validation
statements.
Add an additional tests for validation with a non-nullable variable.
In earlier Terraform versions we had an extra validation step prior to
the graph walk which tried to partially validate root module input
variable values (just checking their type constraints) and then return
error messages which specified as accurately as possible where the value
had originally come from.
We're now handling that sort of validation exclusively during the graph
walk so that we can share the main logic between both root module and
child module variable values, but previously that shared code wasn't
able to generate such specific information about where the values had
originated, because it was adapted from code originally written to only
deal with child module variables.
Here then we restore a similar level of detail as before, when we're
processing root module variables. For child module variables, we use
synthetic InputValue objects which state that the value was declared
in the configuration, thus causing us to produce a similar sort of error
message as we would've before which includes a source range covering
the argument expression in the calling module block.
Previously we had three different layers all thinking they were
responsible for substituting a default value for an unset root module
variable:
- the local backend, via logic in backend.ParseVariableValues
- the context.Plan function (and other similar functions) trying to
preprocess the input variables using
terraform.mergeDefaultInputVariableValues .
- the newer prepareFinalInputVariableValue, which aims to centralize all
of the variable preparation logic so it can be common to both root and
child module variables.
The second of these was also trying to handle type constraint checking,
which is also the responsibility of the central function and not something
we need to handle so early.
Only the last of these consistently handles both root and child module
variables, and so is the one we ought to keep. The others are now
redundant and are causing prepareFinalInputVariableValue to get a slightly
corrupted view of the caller's chosen variable values.
To rectify that, here we remove the two redundant layers altogether and
have unset root variables pass through as cty.NilVal all the way to the
central prepareFinalInputVariableValue function, which will then handle
them in a suitable way which properly respects the "nullable" setting.
This commit includes some test changes in the terraform package to make
those tests no longer rely on the mergeDefaultInputVariableValues logic
we've removed, and to instead explicitly set cty.NilVal for all unset
variables to comply with our intended contract for PlanOpts.SetVariables,
and similar. (This is so that we can more easily catch bugs in callers
where they _don't_ correctly handle input variables; it allows us to
distinguish between the caller explicitly marking a variable as unset vs.
not describing it at all, where the latter is a bug in the caller.)
Previously we had a significant discrepancy between these two situations:
we wrote the raw root module variables directly into the EvalContext and
then applied type conversions only at expression evaluation time, while
for child modules we converted and validated the values while visiting
the variable graph node and wrote only the _final_ value into the
EvalContext.
This confusion seems to have been the root cause for #29899, where
validation rules for root module variables were being applied at the wrong
point in the process, prior to type conversion.
To fix that bug and also make similar mistakes less likely in the future,
I've made the root module variable handling more like the child module
variable handling in the following ways:
- The "raw value" (exactly as given by the user) lives only in the graph
node representing the variable, which mirrors how the _expression_
for a child module variable lives in its graph node. This means that
the flow for the two is the same except that there's no expression
evaluation step for root module variables, because they arrive as
constant values from the caller.
- The set of variable values in the EvalContext is always only "final"
values, after type conversion is complete. That in turn means we no
longer need to do "just in time" conversion in
evaluationStateData.GetInputVariable, and can just return the value
exactly as stored, which is consistent with how we handle all other
references between objects.
This diff is noisier than I'd like because of how much it takes to wire
a new argument (the raw variable values) through to the plan graph builder,
but those changes are pretty mechanical and the interesting logic lives
inside the plan graph builder itself, in NodeRootVariable, and
the shared helper functions in eval_variable.go.
While here I also took the opportunity to fix a historical API wart in
EvalContext, where SetModuleCallArguments was built to take a set of
variable values all at once but our current caller always calls with only
one at a time. That is now just SetModuleCallArgument singular, to match
with the new SetRootModuleArgument to deal with root module variables.
This test seems to be a holdover from the many-moons-ago switch from one
graph for all operations to separate graphs for plan and apply. It is
effectively just a copy of a subset of the content of the Context.Validate
function and is a maintainability hazard because it tends to lag behind
updates to that function unless changes there happen to make it fail.
This test doesn't cover anything that the other validate context tests
don't exercise as an implementation detail of calling Context.Validate,
so I've just removed it with no replacement.
Our original messaging here was largely just lifted from the equivalent
message for unknown values in "count", and it didn't really include any
specific advice on how to update a configuration to make for_each valid,
instead focusing only on the workaround of using the -target planning
option.
It's tough to pack in a fully-actionable suggestion here since unknown
values in for_each keys tends to be a gnarly architectural problem rather
than a local quirk -- when data flows between modules it can sometimes be
unclear whether it'll end up being used in a context which allows unknown
values.
I did my best to summarize the advice we've been giving in community forum
though, in the hope that more people will be able to address this for
themselves without asking for help, until we're one day able to smooth
this out better with a mechanism such as "partial apply".
The specific output order is meaningless, but it should always be the same after
two Encode() calls with identical (ignoring in-memory order) dependency sets.
This uses the decoupled build and run strategy to run the e2etests so that
we can arrange to run the tests against the real release packages produced
elsewhere in this workflow, rather than ones generated just in time by
the test harness.
The modifications to make-archive.sh here make it more consistent with its
originally-intended purpose of producing a harness for testing "real"
release executables. Our earlier compromise of making it include its own
terraform executable came from a desire to use that script as part of
manual cross-platform testing when we weren't yet set up to support
automation of those tests as we're doing here. That does mean, however,
that the terraform-e2etest package content must be combined with content
from a terraform release package in order to produce a valid contest for
running the tests.
We use a single job to cross-compile the test harness for all of the
supported platforms, because that build is relatively fast and so not
worth the overhead of matrix build, but then use a matrix build to
actually run the tests so that we can run them in a worker matching the
target platform.
We currently have access only to amd64 (x64) runners in GitHub Actions
and so for the moment this process is limited only to the subset of our
supported platforms which use that architecture.
When creating a Set of BasicEdges, the Hashcode function is used to determine
map keys for the underlying set data structure.
The string hex representation of the two vertices' pointers is unsafe to use
as a map key, since these addresses may change between the time they are added
to the set and the time the set is operated on.
Instead we modify the Hashcode function to maintain the references to the
underlying vertices so they cannot be garbage collected during the lifetime
of the Set.
TransitiveReduction does not rely on having a single root, and only
must be free of cycles.
DepthFirstWalk and ReverseDepthFirstWalk do not do a topological sort,
so if order matters TransitiveReduction must be run first.
These two functions were left during a refactor to ensure the old
behavior of a sorted walk was still accessible in some manner. The
package has since been removed from any public API, and the sorted
versions are no longer called, so we can remove them.
Create a separate `validateMoveStatementGraph` function so that
`ValidateMoves` and `ApplyMoves` both check the same conditions. Since
we're not using the builtin `graph.Validate` method, because we may have
multiple roots and want better cycle diagnostics, we need to add checks
for self references too. While multiple roots are an error enforced by
`Validate` for the concurrent walk, they are OK when using
`TransitiveReduction` and `ReverseDepthFirstWalk`, so we can skip that
check.
Apply moves must first use `TransitiveReduction` to reduce the graph,
otherwise nodes may be skipped if they are passed over by a transitive
edge.
Changing only the index on a nested module will cause all nested moves
to create cycles, since their full addresses will match both the From
and To addresses. When building the dependency graph, check if the
parent is only changing the index of the containing module, and prevent
the backwards edge for the move.
Add a method for checking if the From and To addresses in a move
statement are only changing the indexes of modules relative to the
statement module.
This is needed because move statement nested within the module will be
able to match against both the From and To addresses, causing cycles in
the order of move operations.
There was an unintended regression in go-getter v1.5.9's GitGetter which
caused us to temporarily fork that particular getter into Terraform to
expedite a fix. However, upstream v1.5.10 now includes a
functionally-equivalent fix and so we can heal that fork by upgrading.
We'd also neglected to update the Module Sources docs when upgrading to
go-getter v1.5.9 originally and so we were missing documentation about the
new "depth" argument to enable shadow cloning, which I've added
retroactively here along with documenting its restriction of only
supporting named refs.
This new go-getter release also introduces a new credentials-passing
method for the Google Cloud Storage getter, and so we must incorporate
that into the Terraform-level documentation about module sources.
Changing only the index on a nested module will cause all nested moves
to create cycles, since their full addresses will match both the From
and To addresses. When building the dependency graph, check if the
parent is only changing the index of the containing module, and prevent
the backwards edge for the move.
Add a method for checking if the From and To addresses in a move
statement are only changing the indexes of modules relative to the
statement module.
This is needed because move statement nested within the module will be
able to match against both the From and To addresses, causing cycles in
the order of move operations.
When applying module `moved` statements by iterating through modules in
state, we previously required an exact match from the `moved`
statement's `from` field and the module address. This permitted moving
resources directly inside a module, but did not recur into module calls
within those moved modules.
This commit moves that exact match requirement so that it only applies
to `moved` statements targeting resources. In turn this allows nested
modules to be moved.
Resource dependencies are by nature an unordered collection, but they're
persisted to state as a JSON array (in random order). This makes a mess for
`terraform apply -refresh-only`, which sees the new random order as a change
that requires the user to approve a state update.
(As an additional problem on top of that, the user interface for refresh-only
runs doesn't expect to see that as a type of change, so it says "no changes!
would you like to update to reflect these detected changes?")
This commit changes `ResourceInstanceObject.Encode()` to sort the in-memory
slice of dependencies (lexically, by address) before passing it on to be
compared and persisted. This appears to fix the observed UI issues with a
minimum of logic changes.
As the cloud e2e tests evolved some common patters became apparent. This
standardizes and consolidates the patterns into a common test runner
that takes the table tests and runs them in parallel. Some tests also
needed to be converted to utilize table tests.
Previously we would only ever add new lock entries or update existing
ones. However, it's possible that over time a module may _cease_ using
a particular provider, at which point we ought to remove it from the lock
file so that operations won't fail when seeing that the provider cache
directory is inconsistent with the lock file.
Now the provider installer (EnsureProviderVersions) will remove any lock
file entries that relate to providers not included in the given
requirements, which therefore makes the resulting lock file properly match
the set of packages the installer wrote into the cache.
This does potentially mean that someone could inadvertently defeat the
lock by removing a provider dependency, running "terraform init", then
undoing that removal, and finally running "terraform init" again. However,
that seems relatively unlikely compared to the likelihood of removing
a provider and keeping it removed, and in the event it _did_ happen the
changes to the lock entry for that provider would be visible in the diff
of the provider lock file as usual, and so could be noticed in code
review just as for any other change to dependencies.
When showing a saved plan, we do not need to check the state lineage
against current state, because the plan cannot be applied. This is
relevant when plan and apply specify a `-state` argument to choose a
non-default state file. In this case, the stored prior state in the plan
will not match the default state file, so a lineage check will always
error.
instances.Set is only used after all instances have been processes, so
it should therefor only handle known instances and not panic when given
an address that traverses an unexpanded module.
Running tests in parallel can help speed up overall test execution. Go
blocks parent tests while child tests run, so it does not fully fan out
as you might expect. It is noticably faster, though. Running 4 or more
concurrent processes knocks over a minute off the total execution time.
Revert the evaluation change from #29862.
While returning a dynamic value for all expanded resources during
validation is not optimal, trying to work around this using unknown maps
and lists is causing other undesirable behaviors during evaluation.
Earlier versions of this code allowed "ref" to take any value that would
be accepted by "git checkout" as a valid target of a symbolic ref. We
inadvertently accepted a breaking change to upstream go-getter that broke
that as part of introducing a shallow clone optimization, because shallow
clone requires selecting a single branch.
To restore the previous capabilities while retaining the "depth" argument,
here we accept a compromise where "ref" has the stronger requirement of
being a valid named ref in the remote repository if and only if "depth"
is set to a value greater than zero. If depth isn't set or is less than
one, we will do the old behavior of just cloning all of the refs in the
remote repository in full and then switching to refer to the selected
branch, tag, or naked commit ID as a separate step.
This includes a heuristic to generate an additional error message hint if
we get an error from "git clone" and it looks like the user might've been
trying to use "depth" and "ref=COMMIT" together. We can't recognize that
error accurately because it's only reported as human-oriented git command
output, but this heuristic should hopefully minimize situations where we
show it inappropriately.
For now this is a change in the Terraform repository directly, so that we
can expedite the fix to an already-reported regression. After this is
released I tend to also submit a similar set of changes to upstream
go-getter, at which point we can revert Terraform to using the upstream
getter.GitGetter instead of our own local fork.
This is a pragmatic temporary solution to allow us to more quickly resolve
an upstream regression in go-getter locally within Terraform, so that the
work to upstream it for other callers can happen asynchronously and with
less time pressure.
This commit doesn't yet include any changes to address the bug, and
instead aims to be functionally equivalent to getter.GitGetter. A
subsequent commit will then address the regression, so that the diff of
that commit will be easier to apply later to the upstream to get the same
effect there.
A regression introduced in d72a413ef8
The comment explains, but TLDR: The remote backend actually *depended*
on being able to write it's backend state even though an 'error'
occurred (no workspaces).
This is an explicit technical debt note that our plan renderer isn't able
to give a fully-specific hint in this particular case of deletion reason.
This reason code means that at least one of the module instance keys in
the resource's module path doesn't match an instance declared in the
configuration, but the plan data structure doesn't retain enough
information to know which is the first step in the path which refers to
a missing instance, and so we just always return the whole thing.
This would be confusing if we return module.foo[0].module.bar not being
in the configuration as a result of module.foo not using "count"; it would
be better to say "module.foo[0] is not in the configuration" instead.
It would be most ideal to handle all of the different situations that
ResourceInstanceDeleteBecauseWrongRepetition's rendering does, so that we
can go further and explain exactly _why_ that module instance isn't
declared anymore.
We can do neither of those things today because only the Terraform Core
"expander" component knows that information, and we've discarded that
by the time we get to rendering a plan. To fix this one day would require
preserving in the plan information about which module instances are
declared, as a separate sidecar data structure from which resource
instances we're taking actions on, and then using that to identify which
step in addr.Module here first selects an invalid instance.
Previously we were treating it as a programming error to ask for the
instances of a resource inside an instance of a module that is declared
but whose declaration doesn't include the given instance key.
However, that's actually a valid situation which can arise if, for
example, the user has changed the repetition/expansion mode for an
existing module call and so now all of the resource instances addresses it
previously contained are "orphaned".
To represent that, we'll instead say that an invalid instance key of a
declared module behaves as if it contains no resource instances at all,
regardless of the configurations of any resources nested inside. This
then gives the result needed to successfully detect all of the former
resource instances as "orphaned" and plan to destroy them.
However, this then introduces a new case for
NodePlannableResourceInstanceOrphan.deleteActionReason to deal with: the
resource configuration still exists (because configuration isn't aware of
individual module/resource instances) but the module instance does not.
This actually allows us to resolve, at least partially, a previous missing
piece of explaining to the user why the resource instances are planned
for deletion in that case, finally allowing us to be explicit to the user
that it's because of the module instance being removed, which
internally we call plans.ResourceInstanceDeleteBecauseNoModule.
Co-authored-by: Alisdair McDiarmid <alisdair@users.noreply.github.com>
As explained in the changes: The 'enhanced' backend terminology, which
only truly pertains to the 'remote' backend with a single API (Terraform
Cloud/Enterprise's), has been found to be a confusing vestige which need
only be explained in the context of the 'remote' backend.
These changes reorient the explanation(s) of backends to pertain more
directly to their primary purpose, which is storage of state snapshots
(and not implementing operations).
That Terraform operations are still _implemented_ by the literal
`Backend` and `Enhanced` interfaces is inconsequential a user of
Terraform, an internal detail.
Some function errors include values derived from arguments. This commit
is the result of a manual audit of these errors, which resulted in:
- Adding a helper function to redact sensitive values;
- Applying that helper function where errors include values derived from
possibly-sensitive arguments;
- Cleaning up other errors which need not include those values, or were
otherwise incorrect.
When migrating from an explicit local backend to Terraform Cloud, we ask
if you want to migrate the state. If there is no state to migrate we
should not ask if they want to migrate the emptiness.
When going from a local backend to Terraform Cloud, if you skip the
`terraform init` and run `terraform apply` this will give the user more
clear instructions.
When terraform detects that a user has no workspaces that map to their current configuration, it will prompt the user to create a new workspace and enter a value name. If the user ignores the prompt and exits it, the legacy backend (terraform.tfstate) will be left in a awkward state:
1. This saved backend config will show a diff for the JSON attributes "serial", "tags" and "hash"
2. "Terraform workspace list" will show an empty list
3. "Terraform apply" will run successfully using the previous workspace, from the previous config, not the one from the current saved backend config
4. The cloud config is not reflective of the current working directory
Solution: If the user exits the prompt, the saved backend config should not be updated because they did not select a new workspace. They are back at the beginning where they are force to re run the init cmd again before proceeding with new changes.
Previously we ended up losing all of the error message detail produced by
the registry address parser, because we treated any registry address
failure as cause to parse the address as a go-getter-style remote address
instead.
That led to terrible feedback in the situation where the user _was_
trying to write a module address but it was invalid in some way.
Although we can't really tighten this up in the default case due to our
compatibility promises, it's never been valid to use the "version"
argument with anything other than a registry address and so as a
compromise here we'll use the presence of "version" as a heuristic for
user intent to parse the source address as a registry address, and thus
we can return a registry-address-specific error message in that case and
thus give more direct feedback about what was wrong.
This unfortunately won't help someone trying to install from the registry
_without_ a version constraint, but I didn't want to let perfect be the
enemy of the good here, particularly since we recommend using version
constraints with registry modules anyway; indeed, that's one of the main
benefits of using a registry rather than a remote source directly.
Object values returned from providers have their attributes marked as
sensitive based on the provider schema. This was not fully implemented
for nested attribute types, which is corrected in this commit.
Resource instances removed from the configuration would previously use
the implied provider address. This is correct for default providers, but
incorrect for those from other namespaces or hosts. The fix here is to
use the stored provider config if it is present.
We cannot programmatically migrate workspaces to Terraform Cloud without
prompts, so `-input=false` should not be allowed in those cases.
There are 4 scenarios where we need input from a user to complete
migrating workspaces to Terraform Cloud.
1.) Migrate from a single local workspace to Terraform Cloud
* Terraform config for a local backend. Implicit local (no backend
specified) is fine.
* `terraform init` and `terraform apply`
* Change the Terraform config to use the cloud block
* `terraform init -input=false`
* You should now see an error message
2.) Migrate from a remote backend with a prefix to Terraform Cloud with
tags
* Create a workspace in Terraform Cloud manually. The name should
include a prefix, like "app-one"
* Have the terraform config use `backend "remote"` with a prefix set to
"app-"
* `terraform init` and `terraform apply`
* Update the Terraform config to use a cloud block with `tags
= ["app"]`. There should not be a prefix defined in the config now.
* `terraform init -input=false`
* You should now see an error message
3.) Migrate from multiple local workspaces to a single Terraform Cloud
workspace
* Create one or many local workspaces
* `terraform init` and `terraform apply` in each
* Change the Terraform config to use the cloud block
* `terraform init -input=false`
* You should now see an error message
4.) Migrate to Terraform Cloud and ask for a workspace name
* Create several local workspaces
* `terraform init` and `terraform apply` in each
* Change the Terraform config to use the cloud block with tags
* `terraform init -input=false`
* You should now see an error message
* Create a function for logic that assigns value to initReason var after changing backend configuration
Create func determineInitReason() for logic block that assigns value to initReason var after changing backend/cloud configuration block or migrating to a different type of backend configuration. Also clarify 'cloud' configuration block message to say 'Terraform Cloud configuration block has changed' instead of 'Terraform Cloud configuration has changed'.
Some of the wording here needed adjusting with the change that backends
largely reflect state snapshot storage (removing 'enhanced'
designation), and that a 'backend' is not necessarily always present.
This fixes an issue where a user could not disable initialization of the
'cloud' configuration block (As is possible with -backend=false), as
well as add some syntactic sugar around -backend by adding a mutually
exclusive -cloud alias.
Unchanged elements in nested attributes backed by sets were previously
misrendered as empty objects. This commit removes the additional
brackets and adds a count of unchanged elements.
The specialized Terraform Cloud migration process asks right up top
whether the user wants to migrate state, because there are various other
questions contingent on that answer.
Therefore we ought to just honor their earlier answer when we get to the
point of actually doing the state migration, rather than prompting again.
This is tricky because we're otherwise just reusing a codepath that's
common to both modes. Hopefully we can find a better way to do this in
a later commit, but for the moment our main motivation is minimizing risk
to the very next release.
There are a few command line options for "terraform init" which are only
relevant when working with traditional backends, with the Cloud
integration previously just mostly ignoring them, or sometimes misbehaving
slightly due to them creating an unreasonable situation.
Now we'll catch these and return explicit errors, in order to be clear
that these options are not needed nor supported in Cloud mode.
This just gives a little extra information to work with when trying to
understand why a test failed. It doesn't change what any of the tests are
actually trying to test.
This aims to encapsulate the somewhat-weird logic we currently use to
distinguish between the various "terraform init" situations involving
Terraform Cloud mode, in the hope of making codepaths that branch based
on this slightly easier to read.
This isn't yet used, but uses of it will follow in subsequent commits.
This pull request focuses on removing the prompt to rename the default
workspace when it is empty. Functionality already exists to not migrate
an empty workspace. This commit adds some clarifying language in the
comment where we do the evaluation to know whether to ask for a new name
or not. I also added an end to end test, which I should have added to
begin with.
Given: You have multiple explicit local workspaces, and the `default`
workspace is empty.
When: You migrate the workspaces to Terraform Cloud.
Then: Terraform should _not_ ask for a workspace to migrate the
`default` workspace to in Terraform Cloud.