When locking was enabled with the Consul backend and the lock not properly
released, the `terraform force-unlock <lock_id>` command would do nothing as
its implementation would exit early in that case.
It now destroys the session that created the lock and clean both the lock and
the lock-info keys.
A regression test is added to TestConsul_destroyLock() to catch the issue if it
happends again.
Closes https://github.com/hashicorp/terraform/issues/22174
Most of the state package has been deprecated by the states package.
This PR replaces all the references to the old state package that
can be done simply - the low-hanging fruit.
* states: move state.Locker to statemgr
The state.Locker interface was a wrapper around a statemgr.Full, so
moving this was relatively straightforward.
* command: remove unnecessary use of state package for writing local terraform state files
* move state.LocalState into terraform package
state.LocalState is responsible for managing terraform.States, so it
made sense (to me) to move it into the terraform package.
* slight change of heart: move state.LocalState into clistate instead of
terraform
Simplify the use of clistate.Lock by creating a clistate.Locker
instance, which stores the context of locking a state, to allow unlock
to be called without knowledge of how the state was locked.
This alows the backend code to bring the needed UI methods to the point
where the state is locked, and still unlock the state from an outer
scope.
Add a way to inject network errors by setting an immediate deadline on
open consul connections. The consul client currently doesn't retry on
some errors, and will force us to lose our lock.
Once the consul api client is fixed, this test will fail.
Updated the vendored consul which no longer requires the channel adapter
to convert a `chan stuct{}` to a `<-chan struct{}`.
Call testutil.NewTestServerConfigT with the new signature.
When a consul lock is lost, there is a possibility that the associated
session is still active. Most commonly, the long request to watch the
lock key may error out, while the session is continually refreshed at a
rate of TTL/2.
First have the lock monitor retry the lock internally for at least 10
seconds (5 attempts with the default 2 second wait time). In most cases
this will reconnect on the first try, keeping the lock channel open.
If the consul lock can't recover itself, then cancel the session as soon
as possible (terminating the PreiodicRenew will call Session.Destroy),
and start over. In the worse case, the consul agents were split, and the
session still exists on the leader so we may need to wait for the old
session TTL, plus the LockWait time to renew the lock.
We use a Context for the cancellation channels here, because that
removes the need to worry about double-closes and nil channels. It
requires an awkward adapter goroutine for now to convert the Done()
`<-chan` to a `chan` for PeriodicRenew, but makes the rest of the code
safer in the long run.
Consul locks are based on liveness, and may be lost due timeouts,
network issued, etc. If the client determines the lock was lost, attempt
to reacquire the lock immediately.
The client was also not using the `lock` config option. Disable locks if
that is not set.
This matches the consul cli behavior, where locks are cleaned up after
use.
Return an error from re-locking the state. This isn't required by the
Locker interface, but it's an added sanity check for state operations.
What was incorrect here was returning an empty ID and error, which would
indicate that Lock/Unlock isn't supported.
Gove LockInfo a Marshal method for easy serialization, and a String
method for more readable output.
Have the state.Locker implementations use LockError when possible to
return LockInfo and an error.
Use consul locks to implement state locking. The lock path is state path
+ "/.lock" which matches the consul cli default for locks. Lockinfo is
stored at path + "/.lockinfo".