While #16243 added the ability to retry getting a state from S3, Put can
return the same InternalError status. Use the same retry logic when
uploading state to S3.
Add a way to inject network errors by setting an immediate deadline on
open consul connections. The consul client currently doesn't retry on
some errors, and will force us to lose our lock.
Once the consul api client is fixed, this test will fail.
The consul Client is analogous to an http.Client, and we really don't
need more than 1. Configure a single client and store it in the backend.
Replace the default Transport's Dialer to reduce the KeepAlive setting
from 30s to 17s. This avoids racing with the common network timeout
value of 30s, and is also coprime to other common intervals.
Internal errors from S3 are usually transient, and can be immediately retried.
Make 2 attempts at retreiving the state object before returning an error.
A TLS config was being assigned to a Transport in a nil http.Client. The
Transport is built in the consul config by default, but the http.Client
is not built until later in NewClient.
Added locking support via blob leasing (requires that an empty state is
created before any lock can be acquired.
Added support for "environments" in much the same way as the S3 backend.
S3 accepts objects with a leading slash and strips them off. This works
fine except in our workspace hierarchy, which then can no longer find
suffixes matching the full key name.
When remote backend imeplemtations create a new named state, they may
need to acquire a lock and/or save an actual empty state to the backend.
Copy this behavior in the inmem backend for testing.
Updated the vendored consul which no longer requires the channel adapter
to convert a `chan stuct{}` to a `<-chan struct{}`.
Call testutil.NewTestServerConfigT with the new signature.
When a consul lock is lost, there is a possibility that the associated
session is still active. Most commonly, the long request to watch the
lock key may error out, while the session is continually refreshed at a
rate of TTL/2.
First have the lock monitor retry the lock internally for at least 10
seconds (5 attempts with the default 2 second wait time). In most cases
this will reconnect on the first try, keeping the lock channel open.
If the consul lock can't recover itself, then cancel the session as soon
as possible (terminating the PreiodicRenew will call Session.Destroy),
and start over. In the worse case, the consul agents were split, and the
session still exists on the leader so we may need to wait for the old
session TTL, plus the LockWait time to renew the lock.
We use a Context for the cancellation channels here, because that
removes the need to worry about double-closes and nil channels. It
requires an awkward adapter goroutine for now to convert the Done()
`<-chan` to a `chan` for PeriodicRenew, but makes the rest of the code
safer in the long run.
The s3.Backend was using it's own code for DeleteState, but the dynamo
entries are only handled through the RemoteClient. Have DeleteState use
a RemoteClient for delete.
Move the Swift State from a legacy remote state to an official backend.
Add `container` and `archive_container` configuration variables, and deprecate `path` and `archive_path` variables.
Future improvements: Add support for locking and environments.
Since the DynamoDB table used by the S3 backend is no longer only used
for locks, rename it in the config to remove any confusion about it
being lock-specific.