The S3 client can return (nil, nil) when the remote state doesn't exist.
The caused a nil pointer dereference when checking the payload.MD5
against the expected value.
This can happen if the remote state was manually removed, but the digest
entry was left in the DynamoDB table.
When the backend operation is cancelled, immediately call PersistState.
The is a high likelihood that the user is going to terminate the process
early if the provider doesn't return in a timely manner, so persist as
much state as possible.
Have StateHook periodically call PersistState to flush any cached state
to permanent storage. This uses a minimal 10 second interval between
calls to PersistState.
Updates to objects in S3 are only eventually consistent. If the
RemoteClient has a DynamoDB table available, use that to store a
checksum of the last written state, so the object can be verified by the
next client to call Get.
Terraform currently doesn't have any sort of user feedback around
RefreshState/Get, so we poll only for a short time before returning an
error.
In the old remote state system we had the idea of a local backup, which
is actually still present for the legacy backends but no longer applies
for the new-style backends like the s3 backend.
It's problematic when an apply runs for long enough that someone's
time-limited AWS STS credentials expire and then Terraform fails and can't
persist state to S3.
To reduce the risk of lost state, here we add some extra fallback code
for the local apply operation in particular. If either state writing
or state persisting fail then we attempt to write the state to a special
backup file errored.tfstate, and produce an error message that guides the
user on how to retry uploading this state.
In the unlikely event that we can't write to local disk either (e.g.
permissions problems) we take a last-ditch attempt to dump the JSON onto
stdout and advise the user to manually copy it into a file for import.
If even that doesn't work for some reason, we assume a critical Terraform
bug (JSON-serialization problem with states?) and bail out with an
apologetic error message.
This is implemented for the apply command in particular because this is
the one command where new objects are created in real APIs that we don't
want to lose track of. For other operations it's less bad to just generate
a simple error message and have the user retry.
This fixes#14298.
The backend apply operation doesn't need to output the same text as the
cli itself. Instead notify the user that we are in the process of
stopping the operation.
stringer has changed the boilerplate it generates in a recent version.
We'd previously updated to the new format but accientally rolled back
to the old while merging a long-running feature branch.
This restores us back to the new format again.
The documentation for Refresh indicates that it will always return a
valid state, but that wasn't true in the case of a graph builder error.
While this same concept wasn't documented for Apply, it was still
assumed in the terraform apply code.
Since the helper testing framework relies on the absence of a state to
determine if it can call Destroy, the Context can't can't start
returning a state in all cases. Document this, and use the State method
to fetch the correct state value after Apply.
Add a nil check to the WriteState function, so that writing a nil state
is a noop.
Make sure to init before sorting the state, to make sure we're not
attempting to sort nil values. This isn't technically needed with the
current code, but it's just safer in general.
Prevent extra keys in the s3 envPrefix path from showing up as
listed environments.
Better handle keys containing slashes
Add tests for unexpected keys in s3.
This matches the consul cli behavior, where locks are cleaned up after
use.
Return an error from re-locking the state. This isn't required by the
Locker interface, but it's an added sanity check for state operations.
What was incorrect here was returning an empty ID and error, which would
indicate that Lock/Unlock isn't supported.
Use the aws provider code to create the clients for the s3 backend, so
that all the behavior matches that of the provider.
Remove the fake creds from the test, as the aws provider will attempt to
validate them.
Add fields required to create an appropriate context for all calls to
clistate.Lock.
Add missing checks for Meta.stateLock, where we would attempt to lock,
even if locking should be skipped.
- Have the ui Lock helper use state.LockWithContext.
- Rename the message package to clistate, since that's how it's imported
everywhere.
- Use a more idiomatic placement of the Context in the LockWithContext
args.
golang/tools commit 23ca8a263 changed the format of the leading comment
to comply with some new standards discussed here:
https://golang.org/issue/13560
This is the result of running generate with the latest version of
stringer. Everyone working on Terraform will need to update stringer
after this is merged, to avoid reverting this:
go get -u golang.org/x/tools/cmd/stringer
The backend state tests weren't properly checking for persistence.
Update the test to persist states and fetch them again from the backend,
checking that lineage is preserved.
This adds named state (environment) support to the S3 backend.
A state NAME will prepend the configured s3 key with `env:/NAME/`.
The default state will remain rooted in the bucket for backwards
compatibility.
Locks in DynamoDB use the S3 key as the as the primary key value, so
locking will work as expected for multiple states.
Move the S3 State from a legacy remote state to an official backend.
This increases test coverage, uses a set schema for configuration, and
will allow new backend features to be implemented for the S3 state, e.g.
"environments".
Fixes#12871
We were forgetting to remove the legacy remote state from the actual
state value when migrating. This only causes an issue when saving a plan
since the plan contains the state itself and causes an error where both
a backend + legacy state exist.
If saved plans aren't used this causes no noticable issue.
Due to buggy upgrades already existing in the wild, I also added code to
clear the remote section if it exists in a standard unchanged backend
This allows a refresh on a non-existent or empty state file. We changed
this in 0.9.0 to error which seemed reasonable but it turns out this
complicates automation that runs refresh since it now needed to
determine if the state file was empty before running.
Its easier to just revert this into a warning with exit code zero.
The reason this changed is because in 0.8.x and earlier, the output
would be simply empty with exit code zero which seemed odd.
This adds a "lock" config (default true) to allow users to optionally
disable state locking with Consul. This is necessary if the token given
doesn't have session permission and is necessary for backwards
compatibility.