When a consul lock is lost, there is a possibility that the associated
session is still active. Most commonly, the long request to watch the
lock key may error out, while the session is continually refreshed at a
rate of TTL/2.
First have the lock monitor retry the lock internally for at least 10
seconds (5 attempts with the default 2 second wait time). In most cases
this will reconnect on the first try, keeping the lock channel open.
If the consul lock can't recover itself, then cancel the session as soon
as possible (terminating the PreiodicRenew will call Session.Destroy),
and start over. In the worse case, the consul agents were split, and the
session still exists on the leader so we may need to wait for the old
session TTL, plus the LockWait time to renew the lock.
We use a Context for the cancellation channels here, because that
removes the need to worry about double-closes and nil channels. It
requires an awkward adapter goroutine for now to convert the Done()
`<-chan` to a `chan` for PeriodicRenew, but makes the rest of the code
safer in the long run.
The s3.Backend was using it's own code for DeleteState, but the dynamo
entries are only handled through the RemoteClient. Have DeleteState use
a RemoteClient for delete.
Move the Swift State from a legacy remote state to an official backend.
Add `container` and `archive_container` configuration variables, and deprecate `path` and `archive_path` variables.
Future improvements: Add support for locking and environments.
Since the DynamoDB table used by the S3 backend is no longer only used
for locks, rename it in the config to remove any confusion about it
being lock-specific.
Consul locks are based on liveness, and may be lost due timeouts,
network issued, etc. If the client determines the lock was lost, attempt
to reacquire the lock immediately.
The client was also not using the `lock` config option. Disable locks if
that is not set.
The S3 client can return (nil, nil) when the remote state doesn't exist.
The caused a nil pointer dereference when checking the payload.MD5
against the expected value.
This can happen if the remote state was manually removed, but the digest
entry was left in the DynamoDB table.
Updates to objects in S3 are only eventually consistent. If the
RemoteClient has a DynamoDB table available, use that to store a
checksum of the last written state, so the object can be verified by the
next client to call Get.
Terraform currently doesn't have any sort of user feedback around
RefreshState/Get, so we poll only for a short time before returning an
error.
Prevent extra keys in the s3 envPrefix path from showing up as
listed environments.
Better handle keys containing slashes
Add tests for unexpected keys in s3.
This matches the consul cli behavior, where locks are cleaned up after
use.
Return an error from re-locking the state. This isn't required by the
Locker interface, but it's an added sanity check for state operations.
What was incorrect here was returning an empty ID and error, which would
indicate that Lock/Unlock isn't supported.
Use the aws provider code to create the clients for the s3 backend, so
that all the behavior matches that of the provider.
Remove the fake creds from the test, as the aws provider will attempt to
validate them.
This adds named state (environment) support to the S3 backend.
A state NAME will prepend the configured s3 key with `env:/NAME/`.
The default state will remain rooted in the bucket for backwards
compatibility.
Locks in DynamoDB use the S3 key as the as the primary key value, so
locking will work as expected for multiple states.
Move the S3 State from a legacy remote state to an official backend.
This increases test coverage, uses a set schema for configuration, and
will allow new backend features to be implemented for the S3 state, e.g.
"environments".
This adds a "lock" config (default true) to allow users to optionally
disable state locking with Consul. This is necessary if the token given
doesn't have session permission and is necessary for backwards
compatibility.