We're shifting terminology from "environment" to "workspace". This takes
care of some of the main internal API surface that was using the old
terminology, though is not intended to be entirely comprehensive and is
mainly just to minimize the amount of confusion for maintainers as we
continue moving towards eliminating the old terminology.
This allows you to run multiple concurrent terraform operations against
different environments from the same source directory.
Fixes#14447.
Also removes some dead code which appears to do the same thing as the function I
modified.
Rather than providing an already-resolved map of plugins to core, we now
provide a "provider resolver" which knows how to resolve a set of provider
dependencies, to be determined later, and produce that map.
This requires the context to be instantiated in a different way, so this
very noisy diff is a mostly-mechanical update of all of the existing
places where contexts get created for testing, using some adapted versions
of the pre-existing utilities for passing in mock providers.
This reverts commit b73d037761.
This commit seems to have introduced a race condition where we can
concurrently keep updating state after we've checked if we need to
increase the serial, and thus end up writing partial changes
to the state backend.
In the case of Terraform Enterprise, this fails altogether because
of the state hash consistency check it does.
Since the DynamoDB table used by the S3 backend is no longer only used
for locks, rename it in the config to remove any confusion about it
being lock-specific.
Consul locks are based on liveness, and may be lost due timeouts,
network issued, etc. If the client determines the lock was lost, attempt
to reacquire the lock immediately.
The client was also not using the `lock` config option. Disable locks if
that is not set.
The S3 client can return (nil, nil) when the remote state doesn't exist.
The caused a nil pointer dereference when checking the payload.MD5
against the expected value.
This can happen if the remote state was manually removed, but the digest
entry was left in the DynamoDB table.
When the backend operation is cancelled, immediately call PersistState.
The is a high likelihood that the user is going to terminate the process
early if the provider doesn't return in a timely manner, so persist as
much state as possible.
Have StateHook periodically call PersistState to flush any cached state
to permanent storage. This uses a minimal 10 second interval between
calls to PersistState.
Updates to objects in S3 are only eventually consistent. If the
RemoteClient has a DynamoDB table available, use that to store a
checksum of the last written state, so the object can be verified by the
next client to call Get.
Terraform currently doesn't have any sort of user feedback around
RefreshState/Get, so we poll only for a short time before returning an
error.
In the old remote state system we had the idea of a local backup, which
is actually still present for the legacy backends but no longer applies
for the new-style backends like the s3 backend.
It's problematic when an apply runs for long enough that someone's
time-limited AWS STS credentials expire and then Terraform fails and can't
persist state to S3.
To reduce the risk of lost state, here we add some extra fallback code
for the local apply operation in particular. If either state writing
or state persisting fail then we attempt to write the state to a special
backup file errored.tfstate, and produce an error message that guides the
user on how to retry uploading this state.
In the unlikely event that we can't write to local disk either (e.g.
permissions problems) we take a last-ditch attempt to dump the JSON onto
stdout and advise the user to manually copy it into a file for import.
If even that doesn't work for some reason, we assume a critical Terraform
bug (JSON-serialization problem with states?) and bail out with an
apologetic error message.
This is implemented for the apply command in particular because this is
the one command where new objects are created in real APIs that we don't
want to lose track of. For other operations it's less bad to just generate
a simple error message and have the user retry.
This fixes#14298.
The backend apply operation doesn't need to output the same text as the
cli itself. Instead notify the user that we are in the process of
stopping the operation.
stringer has changed the boilerplate it generates in a recent version.
We'd previously updated to the new format but accientally rolled back
to the old while merging a long-running feature branch.
This restores us back to the new format again.
The documentation for Refresh indicates that it will always return a
valid state, but that wasn't true in the case of a graph builder error.
While this same concept wasn't documented for Apply, it was still
assumed in the terraform apply code.
Since the helper testing framework relies on the absence of a state to
determine if it can call Destroy, the Context can't can't start
returning a state in all cases. Document this, and use the State method
to fetch the correct state value after Apply.
Add a nil check to the WriteState function, so that writing a nil state
is a noop.
Make sure to init before sorting the state, to make sure we're not
attempting to sort nil values. This isn't technically needed with the
current code, but it's just safer in general.
Prevent extra keys in the s3 envPrefix path from showing up as
listed environments.
Better handle keys containing slashes
Add tests for unexpected keys in s3.
This matches the consul cli behavior, where locks are cleaned up after
use.
Return an error from re-locking the state. This isn't required by the
Locker interface, but it's an added sanity check for state operations.
What was incorrect here was returning an empty ID and error, which would
indicate that Lock/Unlock isn't supported.
Use the aws provider code to create the clients for the s3 backend, so
that all the behavior matches that of the provider.
Remove the fake creds from the test, as the aws provider will attempt to
validate them.
Add fields required to create an appropriate context for all calls to
clistate.Lock.
Add missing checks for Meta.stateLock, where we would attempt to lock,
even if locking should be skipped.
- Have the ui Lock helper use state.LockWithContext.
- Rename the message package to clistate, since that's how it's imported
everywhere.
- Use a more idiomatic placement of the Context in the LockWithContext
args.
golang/tools commit 23ca8a263 changed the format of the leading comment
to comply with some new standards discussed here:
https://golang.org/issue/13560
This is the result of running generate with the latest version of
stringer. Everyone working on Terraform will need to update stringer
after this is merged, to avoid reverting this:
go get -u golang.org/x/tools/cmd/stringer
The backend state tests weren't properly checking for persistence.
Update the test to persist states and fetch them again from the backend,
checking that lineage is preserved.
This adds named state (environment) support to the S3 backend.
A state NAME will prepend the configured s3 key with `env:/NAME/`.
The default state will remain rooted in the bucket for backwards
compatibility.
Locks in DynamoDB use the S3 key as the as the primary key value, so
locking will work as expected for multiple states.
Move the S3 State from a legacy remote state to an official backend.
This increases test coverage, uses a set schema for configuration, and
will allow new backend features to be implemented for the S3 state, e.g.
"environments".
Fixes#12871
We were forgetting to remove the legacy remote state from the actual
state value when migrating. This only causes an issue when saving a plan
since the plan contains the state itself and causes an error where both
a backend + legacy state exist.
If saved plans aren't used this causes no noticable issue.
Due to buggy upgrades already existing in the wild, I also added code to
clear the remote section if it exists in a standard unchanged backend