If the user wishes to interrupt the running operation, only the first
interrupt was communicated to the operation by canceling the provided
context. A second interrupt would start the shutdown process, but not
communicate this to the running operation. This order of event could
cause partial writes of state.
What would happen is that once the command returns, the plugin system
would stop the provider processes. Once the provider processes dies, all
pending Eval operations would return return with an error, and quickly
cause the operation to complete. Since the backend code didn't know that
the process was shutting down imminently, it would continue by
attempting to write out the last known state. Under the right
conditions, the process would exit part way through the writing of the
state file.
Add Stop and Cancel CancelFuncs to the RunningOperation, to allow it to
easily differentiate between the two signals. The backend will then be
able to detect a shutdown and abort more gracefully.
In order to ensure that the backend is not in the process of writing the
state out, the command will always attempt to wait for the process to
complete after cancellation.
Validation is the best time to return detailed diagnostics
to the user since we're much more likely to have source
location information, etc than we are in later operations.
This change doesn't actually add any detail to the messages
yet, but it changes the interface so that we can gradually
introduce more detailed diagnostics over time.
While here there are some minor adjustments to some of the
messages to improve their consistency with terminology we
use elsewhere.
Previously we forced all remote state backends to be wrapped in a
BackupState wrapper that generates a local "terraform.tfstate.backup"
file before updating the remote state.
This backup mechanism was motivated by allowing users to recover a
previous state if user error caused an undesirable change such as loss
of the record of one or more resources. However, it also has the downside
of flushing a possibly-sensitive state to local disk in a location where
users may not realize its purpose and accidentally check it into version
control. Those using remote state would generally prefer that state never
be flushed to local disk at all.
The use-case of recovering older states can be dealt with for remote
backends by selecting a backend that has preservation of older versions
as a first-class feature, such as S3 versioning or Terraform Enterprise's
first-class historical state versioning mechanism.
There remains still one case where state can be flushed to local disk: if
a write to the remote backend fails during "terraform apply" then we will
still create the "errored.tfstate" file to allow the user to recover. This
seems like a reasonable compromise because this is done only in an
_exceptional_ case, and the console output makes it very clear that this
file has been created.
Fixes#15339.
In #15884 we adjusted the plan output to give an explicit command to run
to apply a plan, whereas before this command was just alluded to in the
prose.
Since releasing that, we've got good feedback that it's confusing to
include such instructions when Terraform is running in a workflow
automation tool, because such tools usually abstract away exactly what
commands are run and require users to take different actions to
proceed through the workflow.
To accommodate such environments while retaining helpful messages for
normal CLI usage, here we introduce a new environment variable
TF_IN_AUTOMATION which, when set to a non-empty value, is a hint to
Terraform that it isn't being run in an interactive command shell and
it should thus tone down the "next steps" messaging.
The documentation for this setting is included as part of the "...in
automation" guide since it's not generally useful in other cases. We also
intentionally disclaim comprehensive support for this since we want to
avoid creating an extreme number of "if running in automation..."
codepaths that would increase the testing matrix and hurt maintainability.
The focus is specifically on the output of the three commands we give in
the automation guide, which at present means the following two situations:
* "terraform init" does not include the final paragraphs that suggest
running "terraform plan" and tell you in what situations you might need
to re-run "terraform init".
* "terraform plan" does not include the final paragraphs that either
warn about not specifying "-out=..." or instruct to run
"terraform apply" with the generated plan file.
The previous diff presentation was rather "wordy", and not very friendly
to those who can't see color either because they have color-blindness or
because they don't have a color-supporting terminal.
This new presentation uses the actual symbols used in the plan output
and tries to be more concise. It also uses some framing characters to
try to separate the different stages of "terraform plan" to make it
easier to visually navigate.
The apply command also adopts this new plan presentation, in preparation
for "terraform apply" (with interactive plan confirmation) becoming the
primary, safe workflow in the next major release.
Finally, we standardize on the terminology "perform" and "actions" rather
than "execute" and "changes" to reflect the fact that reading is now an
action and that isn't actually a _change_.
Previously the rendered plan output was constructed directly from the
core plan and then annotated with counts derived from the count hook.
At various places we applied little adjustments to deal with the fact that
the user-facing diff model is not identical to the internal diff model,
including the special handling of data source reads and destroys. Since
this logic was just muddled into the rendering code, it behaved
inconsistently with the tally of adds, updates and deletes.
This change reworks the plan formatter so that it happens in two stages:
- First, we produce a specialized Plan object that is tailored for use
in the UI. This applies all the relevant logic to transform the
physical model into the user model.
- Second, we do a straightforward visual rendering of the display-oriented
plan object.
For the moment this is slightly overkill since there's only one rendering
path, but it does give us the benefit of letting the counts be derived
from the same data as the full detailed diff, ensuring that they'll stay
consistent.
Later we may choose to have other UIs for plans, such as a
machine-readable output intended to drive a web UI. In that case, we'd
want the web UI to consume a serialization of the _display-oriented_ plan
so that it doesn't need to re-implement all of these UI special cases.
This introduces to core a new diff action type for "refresh". Currently
this is used _only_ in the UI layer, to represent data source reads.
Later it would be good to use this type for the core diff as well, to
improve consistency, but that is left for another day to keep this change
focused on the UI.
Forward-port the plan state check from the 0.9 series.
0.10 has improved the serial handling for the state, so this adds
relevant comments and some more test coverage for the case of an
incrementing serial during apply.
A common reason to want to use `terraform plan` is to have a chance to
review and confirm a plan before running it. If in fact that is the
only reason you are running plan, this new `terraform apply -auto-approve=false`
flag provides an easier alternative to
P=$(mktemp -t plan)
terraform refresh
terraform plan -refresh=false -out=$P
terraform apply $P
rm $P
The flag defaults to true for now, but in a future version of Terraform it will
default to false.
Rather than overloading InstanceDiff with a "Stub" attribute that is
going to be largely meaningless, we are just going to skip
pre/post-diff hooks altogether. This is under the notion that we will
eventually not need to "stub" a diff for scale-out, stateless nodes on
refresh at all, so diff behaviour won't be necessary at that point, so
we should not assume that hooks will run at this stage anyway.
Also as part of this removed the CountHook test that is now failing
because CountHook is out of scope of the new behaviour.
We are changing the behaviour of the "stub" diff operation to just have
the pre/post-diff hooks skipped on eval, meaning that the test against
CountHook will ultimately be meaningless and fail, hence we need a
different test here that tests it on a more general level.
During plan and apply, because the provider constraints need to be built
from a plan, they are not checked until the terraform.Context is
created. Since the context is always requested by the backend during the
Operation, the backend needs to be responsible for generating contextual
error messages for the user.
Instead of formatting the ResolveProviders errors during NewContext,
return a special error type, ResourceProviderError to signal that
init will be required. The backend can then extract and format the
errors.
Changed the language of this field to indicate that this diff is not a
"real" diff, in that it should not be acted on, versus a "quiet" mode,
which would indicate just simply to act silently.
This fixes a bug with the new refresh graph behaviour where a resource
was being counted twice in the UI on part of being scaled out:
* We are no longer transforming refresh nodes without state to
plannable resources (the transformer will be removed shortly)
* A Quiet flag has been added to EvalDiff and InstanceDiff - this
allows for the flagging of a diff that should not be treated as real
diff for purposes of planning
* When there is no state for a refresh node now, a new path is taken
that is similar to plan, but flags Quiet, and does nothing with the
diff afterwards.
Tests pending - light testing has confirmed this should fix the double
count issue, but we should have some tests to actually confirm the bug.
We're shifting terminology from "environment" to "workspace". This takes
care of some of the main internal API surface that was using the old
terminology, though is not intended to be entirely comprehensive and is
mainly just to minimize the amount of confusion for maintainers as we
continue moving towards eliminating the old terminology.
This allows you to run multiple concurrent terraform operations against
different environments from the same source directory.
Fixes#14447.
Also removes some dead code which appears to do the same thing as the function I
modified.
Rather than providing an already-resolved map of plugins to core, we now
provide a "provider resolver" which knows how to resolve a set of provider
dependencies, to be determined later, and produce that map.
This requires the context to be instantiated in a different way, so this
very noisy diff is a mostly-mechanical update of all of the existing
places where contexts get created for testing, using some adapted versions
of the pre-existing utilities for passing in mock providers.
This reverts commit b73d037761.
This commit seems to have introduced a race condition where we can
concurrently keep updating state after we've checked if we need to
increase the serial, and thus end up writing partial changes
to the state backend.
In the case of Terraform Enterprise, this fails altogether because
of the state hash consistency check it does.
When the backend operation is cancelled, immediately call PersistState.
The is a high likelihood that the user is going to terminate the process
early if the provider doesn't return in a timely manner, so persist as
much state as possible.
Have StateHook periodically call PersistState to flush any cached state
to permanent storage. This uses a minimal 10 second interval between
calls to PersistState.
In the old remote state system we had the idea of a local backup, which
is actually still present for the legacy backends but no longer applies
for the new-style backends like the s3 backend.
It's problematic when an apply runs for long enough that someone's
time-limited AWS STS credentials expire and then Terraform fails and can't
persist state to S3.
To reduce the risk of lost state, here we add some extra fallback code
for the local apply operation in particular. If either state writing
or state persisting fail then we attempt to write the state to a special
backup file errored.tfstate, and produce an error message that guides the
user on how to retry uploading this state.
In the unlikely event that we can't write to local disk either (e.g.
permissions problems) we take a last-ditch attempt to dump the JSON onto
stdout and advise the user to manually copy it into a file for import.
If even that doesn't work for some reason, we assume a critical Terraform
bug (JSON-serialization problem with states?) and bail out with an
apologetic error message.
This is implemented for the apply command in particular because this is
the one command where new objects are created in real APIs that we don't
want to lose track of. For other operations it's less bad to just generate
a simple error message and have the user retry.
This fixes#14298.
The backend apply operation doesn't need to output the same text as the
cli itself. Instead notify the user that we are in the process of
stopping the operation.
stringer has changed the boilerplate it generates in a recent version.
We'd previously updated to the new format but accientally rolled back
to the old while merging a long-running feature branch.
This restores us back to the new format again.
The documentation for Refresh indicates that it will always return a
valid state, but that wasn't true in the case of a graph builder error.
While this same concept wasn't documented for Apply, it was still
assumed in the terraform apply code.
Since the helper testing framework relies on the absence of a state to
determine if it can call Destroy, the Context can't can't start
returning a state in all cases. Document this, and use the State method
to fetch the correct state value after Apply.
Add a nil check to the WriteState function, so that writing a nil state
is a noop.
Make sure to init before sorting the state, to make sure we're not
attempting to sort nil values. This isn't technically needed with the
current code, but it's just safer in general.
Add fields required to create an appropriate context for all calls to
clistate.Lock.
Add missing checks for Meta.stateLock, where we would attempt to lock,
even if locking should be skipped.
- Have the ui Lock helper use state.LockWithContext.
- Rename the message package to clistate, since that's how it's imported
everywhere.
- Use a more idiomatic placement of the Context in the LockWithContext
args.