Some of our errors returned here were lacking context about what part of
the file was problematic, which led to some useless error reporting for
some real-world situations that this upgrade process doesn't seem to be
catching.
Here we add additional context to those error cases, as a step towards
tracking down exactly which upgrade cases are missing here so that we can
potentially fix them in a subsequent release.
When failing to write the state, the local backend writes the state to a local file called `errrored.tfstate`. Previously it would do so by creating a new state file which would use a new serial and lineage. By exorting the existing state file and directly assigning the new state, the serial and lineage are preserved.
In earlier refactoring we updated these commands to support the new
address and state types, but attempted to partially retain the old-style
"StateFilter" abstraction that originally lived in the Terraform package,
even though that was no longer being used for any other functionality.
Unfortunately the adaptation of the existing filtering to the new types
wasn't exact and so these commands ended up having a few bugs that were
not covered by the existing tests.
Since the old StateFilter behavior was the source of various misbehavior
anyway, here it's removed altogether and replaced with some simpler
functions in the state_meta.go file that are tailored to the use-cases of
these sub-commands.
As well as just generally behaving more consistently with the other
parts of Terraform that use the new resource address types, this commit
fixes the following bugs:
- A resource address of aws_instance.foo would previously match an
resource of that type and name in any module, which disagreed with the
expected interpretation elsewhere of meaning a single resource in the
root module.
- The "terraform state mv" command was not supporting moves from a single
resource address to an indexed address and vice-versa, because the old
logic didn't need to make that distinction while they are two separate
address types in the new logic. Now we allow resources that do not have
count/for_each to be treated as if they are instances for the purposes
of this command, which is a better match for likely user intent and for
the old behavior.
Finally, we also clean up a little some of the usage output from these
commands, which hasn't been updated for some time and so had both some
stale information and some inaccurate terminology.
This ensures that we test using the same source as we're using everywhere
else, and more tactically also ensures that when running in Travis-CI we
won't try to download all of the dependencies of Terraform during this
test.
In the long run we will look for a more global solution to this, rather
than adding this to all of our embedded "go" command calls directly, but
this is intended as a low-risk solution to get the build working again in
the mean time.
If an instance object in state has an earlier schema version number then
it is likely that the schema we're holding won't be able to decode the
raw data that is stored. Instead, we must ask the provider to upgrade it
for us first, which might also include translating it from flatmap form
if it was last updated with a Terraform version earlier than v0.12.
This ends up being a "seam" between our use of int64 for schema versions
in the providers package and uint64 everywhere else. We intend to
standardize on int64 everywhere eventually, but for now this remains
consistent with existing usage in each layer to keep the type conversion
noise contained here and avoid mass-updates to other Terraform components
at this time.
This also includes a minor change to the test helpers for the
backend/local package, which were inexplicably setting a SchemaVersion of
1 on the basic test state but setting the mock schema version to zero,
creating an invalid situation where the state would need to be downgraded.
This was a mistake while adapting this code from the old state.LocalState.
Since the lock is held on the output file (s.path) the metadata should
live adjacent to that rather than being built from the read path
(s.readPath) that is used only as the initial snapshot on first
instantiation.
This also includes more logging, continuing the trend of other recent
commits in these files. The local state behavior is sufficiently complex
that these trace logs are a great help in debugging issues such as this
one with the wrong files being used or actions being taken in the wrong
order.
The filesystem backend has the option of using a different file for its
initial read.
Previously we were incorrectly writing the contents of that file out into
the backup file, rather than the prior contents of the output file. Now
we will always read the output file in RefreshState in order to decide
what we will back up but then we will optionally additionally read the
input file and prefer its content as the "current" state snapshot.
This is verified by command.TestMetaBackend_planLocalStatePath and
TestMetaBackend_configureNew, which are both now passing.
This was failing because we now handle the settings for the local backend
a little differently as a result of decoding it with the HCL2 machinery.
Specifically, the backend.State* fields are now assumed to be what is
given in configuration, and any CLI overrides are maintained separately
in OverrideState* fields so that they can be imposed "just in time" in
StatePaths.
This is particularly important because OverrideStatePath (when set) is
used regardless of workspace name, while StatePath is a suitable value
only for the "default" workspace, with others needing to be constructed
from StateWorkspaceDir instead.
We previously hacked around the import/export functionality being missing
in the statemgr layer after refactoring, but now it's been reintroduced
to fix functionality elsewhere we should use the centralized Import and
Export functions to ensure consistent behavior.
In particular, this pushes the logic for checking lineage and serial
during push down into the state manager itself, which is better because
all other details about lineage and serial are managed within the state
managers.
In our recent refactoring of the state manager interfaces we made serial
and lineage management the responsibility of the state managers
themselves, not exposing them at all to most callers, and allowing for
simple state managers that don't implement them at all.
However, we do have some specific cases where we need to preserve these
properly when available, such as migration between backends, and the
"terraform state push" and "terraform state pull" commands.
These new functions and their associated optional interface allow the
logic here to be captured in one place and access via some simple
calls. Separating this from the main interface leaves things simple for
the normal uses of state managers.
Since these functions are mostly just thin wrappers around other
functionality, they are not yet well-tested directly, but will be
indirectly tested through the tests of their callers. A subsequent commit
will add more unit tests here.
After all of the refactoring we were no longer checking the Terraform
version field in a state file, causing this test to fail.
This restores that check, though with a slightly different error message.
Since protoc is not go-gettable, and most development tasks in Terraform
won't involve recompiling protoc files anyway, we'll use a separate
mechanism for these.
This way "go generate" only depends on things we can "go get" in the
"make tools" target.
In a later commit we should also in some way specify a particular version
of protoc to use so that we don't get "flapping" regenerations as
developers work with different versions, but the priority here is just to
make "make generate" minimally usable again to restore the dev workflow
documented in the README.
This also includes some updates that resulted from running "make generate"
and "make protobuf" after those Makefile changes were in place.
This was previously targeting the old state manager and state types, so it
needed some considerable rework to get it working again with the new state
types.
Since our new resource address syntax lacks the weird extra .deposed
special case we had before, we instead interpret addresses as
whole-instance addresses here and remove the deposed objects along with
the current one (if present), since this is more likely to match with
user expectations because we don't consider deposed objects to be
independently addressable in any other situation.
With that said, to be more explicit about what is going on we do now have
a -dry-run mode and maintain separate counts of current and deposed
instances so that we can expose that in the UI where relevant.
For historical reasons sometimes we have nil state in situations where
we'd still like to persist state snapshots to a store. To make life easier
for those callers, we'll substitute an empty state if we are given a nil
one, thus allowing us to still generate a valid serialization that will
load back in as an empty state.
When we're being asked to destroy everything, we ideally want to end up
with a totally empty state. Normally we will conservatively keep around
the "husks" of resources (what's left after all of the instances have been
destroyed) unless they are configured without count or for_each, but in
this special case we'll prune those out.
The implication of this is that in "weird" expression contexts that happen
before the next "terraform plan", such as evaluation in
"terraform console" or expressions in data resources and provider blocks
that get evaluated during the refresh walk, we will see these results
as unknown rather than as empty lists of objects. We accept that weirdness
for now because in a future release we are likely to remove "refresh" as
a separate walk anyway, doing all of that work during the plan walk where
we can ensure that these values are properly re-populated before trying
to use them.
We previously had mechanisms to clean up only individual instance states,
leaving behind empty resource husks in the state after they were all
destroyed.
This takes care of it in the "orphan" case. It does not yet do it in the
"terraform destroy" or "terraform plan -destroy" cases because we don't
have anywhere to record in the plan that we're actually destroying and so
the resource configurations should be ignored and _everything_ should be
cleaned. We'll let the state be not-quite-empty in that case for now,
since it doesn't really hurt; cleaning up orphans is the main case because
the state will live on afterwards and so leftover cruft will accumulate
over the course of many changes.
I misunderstood the logic here on the first pass of porting to the new
provider and state types: EvalUndeposeState is supposed to return the
deposed object back to being current again, so we can undo the deposing
in the case where the create leg fails.
If we don't do this, we end up leaving the instance with no current object
at all and with its prior object deposed, and then the later destroy
node deletes that deposed object, leaving the user with no object at all.
For safety we skip this restoration if there _is_ a new current object,
since a failed create can still produce a partial result which we need
to keep to avoid losing track of any remote objects that were successfully
created.
Previously our handling of create_before_destroy -- and of deposed objects
in particular -- was rather "implicit" and spread over various different
subsystems. We'd quietly just destroy every deposed object during a
destroy operation, without any user-visible plan to do so.
Here we make things more explicit by tracking each deposed object
individually by its pseudorandomly-allocated key. There are two different
mechanisms at play here, building on the same concepts:
- During a replace operation with create_before_destroy, we *pre-allocate*
a DeposedKey to use for the prior object in the "apply" node and then
pass that exact id to the destroy node, ensuring that we only destroy
the single object we planned to destroy. In the happy path here the
user never actually sees the allocated deposed key because we use it and
then immediately destroy it within the same operation. However, that
destroy may fail, which brings us to the second mechanism:
- If any deposed objects are already present in state during _plan_, we
insert a destroy change for them into the plan so that it's explicit to
the user that we are going to destroy these additional objects, and then
create an individual graph node for each one in DiffTransformer.
The main motivation here is to be more careful in how we handle these
destroys so that from a user's standpoint we never destroy something
without the user knowing about it ahead of time.
However, this new organization also hopefully makes the code itself a
little easier to follow because the connection between the create and
destroy steps of a Replace is reprseented in a single place (in
DiffTransformer) and deposed instances each have their own explicit graph
node rather than being secretly handled as part of the main instance-level
graph node.
Our previous mechanism for dealing with tainting relied on directly
mutating the InstanceState object to mark it as such. In our new state
models we consider the instance objects to be immutable by convention, and
so we frequently copy them. As a result, the taint flagging was no longer
making it all the way through the apply evaluation process.
Here we now implement tainting as a separate step in the evaluation
process, creating a copy of the object with a tainted status if there were
any errors during creation.
This introduces a new behavior where any provider-level errors during
creation will also cause an instance to be marked as tainted if any object
is returned at all. Create-time errors _normally_ result in no object at
all, but the provider might return an object if the failure occurred at
a subsequent step of a multi-step creation process and so left behind a
remote object that needs to be cleaned up on a future run.
In our old world we always used 1-based indices into a slice of deposed
objects. The new models instead use a map keyed by pseudorandom strings,
so that deposed objects will have a consistent identity across multiple
operations.
However, having that pseudo-random string in our test comparison output
is not helpful, since such strings can never match hard-coded expectation
strings. Therefore for the purposes of generating this test comparison
output we'll revert back to using 1-based indexes.
This should avoid problems for tests that only create one deposed object
per instance, but those which create more than one will need to do some
more work since the _ordering_ of these objects in the output is still
pseudorandom as a result of it coming from a map rather than a slice.