When a consul lock is lost, there is a possibility that the associated
session is still active. Most commonly, the long request to watch the
lock key may error out, while the session is continually refreshed at a
rate of TTL/2.
First have the lock monitor retry the lock internally for at least 10
seconds (5 attempts with the default 2 second wait time). In most cases
this will reconnect on the first try, keeping the lock channel open.
If the consul lock can't recover itself, then cancel the session as soon
as possible (terminating the PreiodicRenew will call Session.Destroy),
and start over. In the worse case, the consul agents were split, and the
session still exists on the leader so we may need to wait for the old
session TTL, plus the LockWait time to renew the lock.
We use a Context for the cancellation channels here, because that
removes the need to worry about double-closes and nil channels. It
requires an awkward adapter goroutine for now to convert the Done()
`<-chan` to a `chan` for PeriodicRenew, but makes the rest of the code
safer in the long run.
The Close methods on shadow.Values require pointer receivers because
they contain a sync.Mutex, but that value was being copied through
Value.Interface by the closeWalker. Because reflectwalk passes the
struct fields to the StructField method as they are defined in the
struct, and they may have been read as a value, we can't immediately
call Interface() to check the method set without possibly copying the
internal mutex values. Use the Implements method to first check if we
need to call Interface, and if it's not, then we can check if the value
is addressable.
Because of this use of reflection, we can't vet for the copying of these
locks. The minimal amount of code in the Close method left us only with
a race detected within the mutex itself, which leads to a stacktrace
pointing to the runtime rather than our code.
The improved err scanner loop in meta causes these to race. There's no
need to write back to the same commands struct, so just use a new
instance in each iteration.
Meta.process was relying on the system readdir to order the arguments,
but readdir doesn't guarantee any ordering. Read the directory contents
as a whole and sort them in place before adding the tfvars files.
This changed close to the release of beta1 to use underscores as the
separator and to use a lower-case "v" to avoid any issues on
case-insensitive filesystems.
Previously the APIs for state persistence and management had some problematic cases where we depended on hidden mutations of the state structure as side-effects of otherwise-innocent-looking operations, which was a frequent cause of accidental regressions due to faulty assumptions.
This new model attempts to isolate certain state mutations to just within the state managers, and makes the state managers work on separated snapshots of the state rather than on the "live" object to reduce the risk of race conditions.
Due to how the state filter machinery works, passing no arguments is valid
and matches _all_ resources.
It is very unlikely that someone wants to remove everything from state, so
this ends up being a very dangerous default for the "terraform state rm"
command, and surprising for someone who perhaps runs it looking for the
usage information.
So we'll be pragmatic here and reject the no-arguments case for this
command, accepting that it makes the unlikely case of intentionally
deleting all resources harder in order to make it less likely that it
will happen _unintentionally_.
If someone does really want to remove all resources from the state, they
can provide an explicit empty string argument, but this isn't documented
because it's a weird case that doesn't seem worth mentioning.
This fixes#15283.
The state returned from the testState helper shouldn't rely on any
mutations caused by WriteState. The Init function (which is analogous to
NewState) shoudl set any required fields.
This command serves as an alternative to the human-oriented list of workspaces for scripting use-cases where it's useful to know the _current_ workspace name.
In practice, States must all implement the full interface, so checking
for each method set only leaves gaps where tests could be skipped.
Change the helper to only accept a full state.State implementation.
Add some Lineage, Version, and TFVersion checks to TestState to avoid
regressions.
Compare the copy test against the immediate State returnedm rather than
our previous "current" state.
Check that the states round-trip and still marhsal identically via
MarshalEqual.
Previously we relied on a constellation of coincidences for everything to
work out correctly with state serials. In particular, callers needed to
be very careful about mutating states (or not) because many different bits
of code shared pointers to the same objects.
Here we move to a model where all of the state managers always use
distinct instances of state, copied when WriteState is called. This means
that they are truly a snapshot of the state as it was at that call, even
if the caller goes on mutating the state that was passed.
We also adjust the handling of serials so that the state managers ignore
any serials in incoming states and instead just treat each Persist as
the next version after what was most recently Refreshed.
(An exception exists for when nothing has been refreshed, e.g. because
we are writing a state to a location for the first time. In that case
we _do_ trust the caller, since the given state is either a new state
or it's a copy of something we're migrating from elsewhere with its
state and lineage intact.)
The intent here is to allow the rest of Terraform to not worry about
serials and state identity, and instead just treat the state as a mutable
structure. We'll just snapshot it occasionally, when WriteState is called,
and deal with serials _only_ at persist time.
This is intended as a more robust version of #15423, which was a quick
hotfix to an issue that resulted from our previous slopping handling
of state serials but arguably makes the problem worse by depending on
an additional coincidental behavior of the local backend's apply
implementation.
Normally "terraform init" will download and install the plugins necessary
to work with a particular configuration, but sometimes Terraform is
deployed in a network that, for one reason or another, cannot access the
official plugin repository for automatic download.
terraform-bundle provides an alternative method, allowing the
auto-download process to be run out-of-band on a separate machine that
_does_ have access to the repository. The result is a zip file that can
be extracted onto the target system to install both the desired
Terraform version and a selection of providers, thus avoiding the need
for on-the-fly plugin installation.
This is provided as a separate tool from Terraform because it is not
something that most users will need. In the rare case where this is
needed, we will for the moment assume that users are able to build this
tool themselves. We may later release it in a pre-built form, if it proves
to be generally useful.
It uses the same API from the plugin/discovery package is is used by the
auto-install behavior in "terraform init", so plugin versions are resolved
in the same way. However, it's expected that several different Terraform
configurations will run from the same bundle, so this tool allows the
bundle to include potentially many versions of the same provider and thus
allows each Terraform configuration to select from the available versions
in the bundle, avoiding the need to upgrade all configurations to new
provider versions in lockstep.
Previously we forced only installing for the current GOOS and GOARCH. Now
we allow this to be optionally overridden, which allows building tools
that can, for example, populate a directory with plugins to run on a Linux
server while working on a Mac.
Skips checksum validation if the `TF_SKIP_PROVIDER_VERIFY` environment variable is set. Undocumented variable, as the primary goal is to significantly improve the local provider development workflow.
We need to release the lock just before deleting the state, in case the backend
can't remove the resource while holding the lock. This is currently true for
Windows local files.
TODO: While there is little safety in locking while deleting the state, it
might be nice to be able to coordinate processes around state deletion, i.e. in
a CI environment. Adding Delete() as a required method of States would allow
the removal of the resource to be delegated from the Backend to the State
itself.