In order to support free organizations, we need a way to load the `remote` backend and then, depending on the used offering/plan, enable or disable remote operations.
In other words, we should be able to dynamically fall back to the `local` backend if needed, after first configuring the `remote` backend.
To make this works we need to change the way this was done previously when the env var `TF_FORCE_LOCAL_BACKEND` was set. The clear difference of course being that the env var would be available on startup, while the used offering/plan is only known after being able to connect to TFE.
The changes to how we handle setting the state path on the local backend
broke the heuristic we were using here for detecting migration from one
local backend to another with the same state path, which would by default
end up deleting the state altogether after migration.
We now use the StatePaths method to do this, which takes into account
both the default values and any settings that have been set.
Additionally this addresses a flaw in the old method which could
potentially have deleted all non-default workspace state files if the
"path" setting were changed without also changing the "workspace_dir"
setting. This new approach is conservative because it will preserve all
of the files if any one overlaps.
This was failing because we now handle the settings for the local backend
a little differently as a result of decoding it with the HCL2 machinery.
Specifically, the backend.State* fields are now assumed to be what is
given in configuration, and any CLI overrides are maintained separately
in OverrideState* fields so that they can be imposed "just in time" in
StatePaths.
This is particularly important because OverrideStatePath (when set) is
used regardless of workspace name, while StatePath is a suitable value
only for the "default" workspace, with others needing to be constructed
from StateWorkspaceDir instead.
Newer versions of the retryablehttp package use a context, so we need to
add that in our custom `CheckRetry` function.
In addition I removed the `return true, nil` to continue retrying in
case of an error, and instead directly call the `DefaultRetryPolicy`.
This is because the `DefaultRetryPolicy` will now also take the context
into consideration.
This new source type should be used for variables loaded from .tfvars files that were explicitly passed as command line arguments (e.g. -var-file=foo.tfvars)
This work was done against APIs that were already changed in the branch
before work began, and so it doesn't apply to the v0.12 development work.
To allow v0.12 to merge down to master, we'll revert this work out for now
and then re-introduce equivalent functionality in later commits that works
against the new APIs.
There are several steps here and a number of them can include reaching out
to remote servers or executing local processes, so it's helpful to have
some trace logs to better narrow down causes of errors and hangs during
this step.
In earlier refactoring we skipped implementing prior state safety checks,
propagating the target addresses from plan, and verifying that all of
the providers are exactly the same from the plan being created.
This change reinstates those checks, including a new error message for
the "stale plan" situation.
If we don't do this, we can't produce any output when applying a saved
plan file.
Here we also introduce a check to the local backend's ReportResult
function so that it won't panic if CLI init is skipped, although that
will no longer happen in the apply-from-file case due to the change
described in the previous paragraph.
We can't generate a valid plan file without a backend configuration to
write into it, but it's the responsibility of the caller (the command
package) to manage the backend configuration mechanism, so we require it
to tell us what to write here.
This feels a little strange because the backend in principle knows its
own config, but in practice the backend only knows the _processed_ version
of the config, not the raw configuration value that was used to configure
it.
converted the existing testPlanState() from terraform.State to
states.State to fix various plan tests.
reverted the "bandaid" in plans/planfile/tfplan.go - at this moment the
backend tests do not include backend configuration, and so the planfile
package can write the plan file but not read it back in. That will be
revisted in a separate track of work.
I have no confidence in the change to plans/planfile/tfplan.go. The
tests were passing an empty backend config, which planfile was able to
write to a file but not read from the same file. This change let me move
past that and it did not break any tests in the planfile package, but I
am concerned that it introduces undesired behavior.
The state manager refactoring in an earlier commit was reflected in the
implementations of these backends, but not in their tests. This gets us
back to a state where the backend tests will compile, and gets _most_ of
them passing again, with a few exceptions that will be addressed in a
subsequent commit.
incoming values
Addresses an odd state where the priorV of an object to be changed is
known but null.
While this situation should not happen, it seemed prudent to ensure that
core is resilient to providers sending incorrect values (which might
also occur with manually edited state).
Previously we used a single plan action "Replace" to represent both the
destroy-before-create and the create-before-destroy variants of replacing.
However, this forces the apply graph builder to jump through a lot of
hoops to figure out which nodes need it forced on and rebuild parts of
the graph to represent that.
If we instead decide between these two cases at plan time, the actual
determination of it is more straightforward because each resource is
represented by only one node in the plan graph, and then we can ensure
we put the right nodes in the graph during DiffTransformer and thus avoid
the logic for dealing with deposed instances being spread across various
different transformers and node types.
As a nice side-effect, this also allows us to show the difference between
destroy-then-create and create-then-destroy in the rendered diff in the
CLI, although this change doesn't fully implement that yet.
We're not yet showing outputs in the rendered diff, so it doesn't make
sense to count them for the purpose of deciding which change action
symbols to include in the legend.
This is a light adaptation of our earlier prototype of structural diff
rendering, as a starting point for what we'll actually ship. This is not
consistent with the latest mocks, so will need some additional work before
it is ready, but integrating this allows us to at least see the plan
contents while fixing up remaining issues elsewhere.
Due to how often the state and plan types are referenced throughout
Terraform, there isn't a great way to switch them out gradually. As a
consequence, this huge commit gets us from the old world to a _compilable_
new world, but still has a large number of known test failures due to
key functionality being stubbed out.
The stubs here are for anything that interacts with providers, since we
now need to do the follow-up work to similarly replace the old
terraform.ResourceProvider interface with its replacement in the new
"providers" package. That work, along with work to fix the remaining
failing tests, will follow in subsequent commits.
The aim here was to replace all references to terraform.State and its
downstream types with states.State, terraform.Plan with plans.Plan,
state.State with statemgr.State, and switch to the new implementations of
the state and plan file formats. However, due to the number of times those
types are used, this also ended up affecting numerous other parts of core
such as terraform.Hook, the backend.Backend interface, and most of the CLI
commands.
Just as with 5861dbf3fc49b19587a31816eb06f511ab861bb4 before, I apologize
in advance to the person who inevitably just found this huge commit while
spelunking through the commit history.
The "config" package is no longer used and will be removed as part
of the 0.12 release cleanup. Since configschema is part of the
"new world" of configuration modelling, it makes more sense for
it to live as a subdirectory of the newer "configs" package.
Due to how deeply the configuration types go into Terraform Core, there
isn't a great way to switch out to HCL2 gradually. As a consequence, this
huge commit gets us from the old state to a _compilable_ new state, but
does not yet attempt to fix any tests and has a number of known missing
parts and bugs. We will continue to iterate on this in forthcoming
commits, heading back towards passing tests and making Terraform
fully-functional again.
The three main goals here are:
- Use the configuration models from the "configs" package instead of the
older models in the "config" package, which is now deprecated and
preserved only to help us write our migration tool.
- Do expression inspection and evaluation using the functionality of the
new "lang" package, instead of the Interpolator type and related
functionality in the main "terraform" package.
- Represent addresses of various objects using types in the addrs package,
rather than hand-constructed strings. This is not critical to support
the above, but was a big help during the implementation of these other
points since it made it much more explicit what kind of address is
expected in each context.
Since our new packages are built to accommodate some future planned
features that are not yet implemented (e.g. the "for_each" argument on
resources, "count"/"for_each" on modules), and since there's still a fair
amount of functionality still using old-style APIs, there is a moderate
amount of shimming here to connect new assumptions with old, hopefully in
a way that makes it easier to find and eliminate these shims later.
I apologize in advance to the person who inevitably just found this huge
commit while spelunking through the commit history.
This is a rather-messy, complex change to get the "command" package
building again against the new backend API that was updated for
the new configuration loader.
A lot of this is mechanical rewriting to the new API, but
meta_config.go and meta_backend.go in particular saw some major
changes to interface with the new loader APIs and to deal with
the change in order of steps in the backend API.
The new config loader requires some steps to happen in a different
order, particularly in regard to knowing the schema in order to
decode the configuration.
Here we lean directly on the configschema package, rather than
on helper/schema.Backend as before, because it's generally
sufficient for our needs here and this prepares us for the
helper/schema package later moving out into its own repository
to seed a "plugin SDK".
If we get a diagnostic message that references a source range, and if the
source code for the referenced file is available, we'll show a snippet of
the source code with the source range highlighted.
At the moment we have no cache of source code, so in practice this
codepath can never be visited. Callers to format.Diagnostic will be
gradually updated in subsequent commits.
In some cases this is needed to keep the UX clean and to make sure any remote exit codes are passed through to the local process.
The most obvious example for this is when using the "remote" backend. This backend runs Terraform remotely and stream the output back to the local terminal.
When an error occurs during the remote execution, all the needed error information will already be in the streamed output. So if we then return an error ourselves, users will get the same errors twice.
By allowing the backend to specify the correct exit code, the UX remains the same while preserving the correct exit codes.
This is a bit of a hack to support the `-no-color` flag while we don’t have an option to set run variables.
That is also the reason why the orginal method is commented out instead of deleted. This will be reverted when the TFE starts supporting run variables.
If the policy passes, only show that instead of the full check output to prevent cluttering the output. So a passing policy will only show:
-----------------------------------------------
Organization policy check: passed
-----------------------------------------------
This commit adds:
- support for `-lock-timeout`
- custom error message when a 404 is received
- canceling a pending run when TF is Ctrl-C’ed
- discard a run when the apply is not approved
The pagination info of a list call that returns an empty list contains:
```go
CurrentPage: 1
TotalPages: 0
```
So checking if we have seen all pages using `CurrentPage == TotalPages` will not work and will result in an endless loop.
The tests are updated so they will fail (timeout after 1m) if this is handled incorreclty.
To prevent making unnecessary heavy calls to the backend, we should use a search query to limit the result.
But even if we use a search query, we should still use the pagination details to make sure we retrieved all items.
In TFE you can configure a workspace to use a custom working directory. When determining which directory that needs to be uploaded to TFE, this working directory should be taken into account to make sure we are uploading the correct root directory for the workspace.
Certain backends (currently only the `remote` backend) do not support using both the default and named workspaces at the same time.
To make the migration easier for users that currently use both types of workspaces, this commit adds logic to ask the user for a new workspace name during the migration process.
* cli: show workspace name in destroy confirmation
If the workspace name is not "default", include it in the confirmation
message for `terraform destroy`.
Fixes#15480
This was already added to triton-go and is now making its way to
the manta backend
```
% acctests backend/remote-state/manta
=== RUN TestBackend_impl
--- PASS: TestBackend_impl (0.00s)
=== RUN TestBackend
--- PASS: TestBackend (27.36s)
=== RUN TestBackendLocked
--- PASS: TestBackendLocked (16.24s)
=== RUN TestRemoteClient_impl
--- PASS: TestRemoteClient_impl (0.00s)
=== RUN TestRemoteClient
--- PASS: TestRemoteClient (3.40s)
=== RUN TestRemoteClientLocks
--- PASS: TestRemoteClientLocks (7.17s)
PASS
ok github.com/hashicorp/terraform/backend/remote-state/manta
```
Fixes: #17314
We now deal correctly with the creation of the state file - we were
not dealing well with a ResourceNotFound error
Now that this has been changed around, we try and create the statefile
and if there is an error, we look for an existing statefile - previously
this was not the order of operations
Simplify the use of clistate.Lock by creating a clistate.Locker
instance, which stores the context of locking a state, to allow unlock
to be called without knowledge of how the state was locked.
This alows the backend code to bring the needed UI methods to the point
where the state is locked, and still unlock the state from an outer
scope.
Provide a NoopLocker as well, so that callers can always call Unlock
without verifying the status of the lock.
Add the StateLocker field to the backend.Operation, so that the state
lock can be carried between the different function scopes of the backend
code. This will allow the backend context to lock the state before it's
read, while allowing the different operations to unlock the state when
they complete.
Fix the now failing state unlock test by reporting the correct ID.
The ID used by GCS is the generation number of the info object, which
isn't known until the info is already written out. While we can't get
the correct ID from the info data for the error rmessage, we can update
it with the generation number after it's read.
This adds a general test to verify that a remote state backend returns
the expected error type when it cannot lock a state. It then extracts
the ID reported in the error, and attempts to unlock the state using
that ID, which simulated the force-unlock scenario. This is a separate
test, since not all backends have persistent locks that can be unlocked
later.
We also split out the backend test to be called individually as needed.
Moves the nested select statements for backend operations into a single
function. The only difference in this part was that apply called
PersistState, which should be harmless regardless of the type of
operation being run.
If the user wishes to interrupt the running operation, only the first
interrupt was communicated to the operation by canceling the provided
context. A second interrupt would start the shutdown process, but not
communicate this to the running operation. This order of event could
cause partial writes of state.
What would happen is that once the command returns, the plugin system
would stop the provider processes. Once the provider processes dies, all
pending Eval operations would return return with an error, and quickly
cause the operation to complete. Since the backend code didn't know that
the process was shutting down imminently, it would continue by
attempting to write out the last known state. Under the right
conditions, the process would exit part way through the writing of the
state file.
Add Stop and Cancel CancelFuncs to the RunningOperation, to allow it to
easily differentiate between the two signals. The backend will then be
able to detect a shutdown and abort more gracefully.
In order to ensure that the backend is not in the process of writing the
state out, the command will always attempt to wait for the process to
complete after cancellation.
Since an early version of Terraform, the `destroy` command has always
had the `-force` flag to allow an auto approval of the interactive
prompt. 0.11 introduced `-auto-approve` as default to `false` when using
the `apply` command.
The `-auto-approve` flag was introduced to reduce ambiguity of it's
function, but the `-force` flag was never updated for a destroy.
People often use wrappers when automating commands in Terraform, and the
inconsistency between `apply` and `destroy` means that additional logic
must be added to the wrappers to do similar functions. Both commands are
more or less able to run with similar syntax, and also heavily share
their code.
This commit updates the command in `destroy` to use the `-auto-approve` flag
making working with the Terraform CLI a more consistent experience.
We leave in `-force` in `destroy` for the time-being and flag it as
deprecated to ensure a safe switchover period.
Internally, triton-go has changed how it handles errors. We can now get rid of
checking strings for errors, and we have introduced an errors library that
wraps some of the major errors we encounter and test for
Triton Manta allows an account other than the main triton account to be used via RBAC.
Here we expose the SDC_USER / TRITON_USER options to the backend so that a user can be specified.
This creates a unique bucket name for each test, so that the tests in
parallel don't collide, and buckets left over from interrupted tests
don't cause future failures.
Also make sure that buckets are removed, regardless of content.
The backend was creating bucket named in the configuration if it didn't
exist. We don't allow other backends to do this, because these are not
managed resources that terraform can control.
Previously there was a problem with double-locking when using the GCS backend with the terraform_remote_state data source.
Here we adjust the locking methodology to avoid that problem.
Validation is the best time to return detailed diagnostics
to the user since we're much more likely to have source
location information, etc than we are in later operations.
This change doesn't actually add any detail to the messages
yet, but it changes the interface so that we can gradually
introduce more detailed diagnostics over time.
While here there are some minor adjustments to some of the
messages to improve their consistency with terminology we
use elsewhere.
This PR changes manta from being a legacy remote state client to a new backend type. This also includes creating a simple lock within manta
This PR also unifies the way the triton client is configured (the schema) and also uses the same env vars to set the backend up
It is important to note that if the remote state path does not exist, then the backend will create that path. This means the user doesn't need to fall into a chicken and egg situation of creating the directory in advance before interacting with it
Reuse the running consul server for all tests.
Update the lostLockConnection package, since the api client should no
longer lose a lock immediately on network errors.
This is from a commit just after the v1.0.0 release, because it removes
the Porter service dependency for tests. The client api package was not
changed.
Previously we forced all remote state backends to be wrapped in a
BackupState wrapper that generates a local "terraform.tfstate.backup"
file before updating the remote state.
This backup mechanism was motivated by allowing users to recover a
previous state if user error caused an undesirable change such as loss
of the record of one or more resources. However, it also has the downside
of flushing a possibly-sensitive state to local disk in a location where
users may not realize its purpose and accidentally check it into version
control. Those using remote state would generally prefer that state never
be flushed to local disk at all.
The use-case of recovering older states can be dealt with for remote
backends by selecting a backend that has preservation of older versions
as a first-class feature, such as S3 versioning or Terraform Enterprise's
first-class historical state versioning mechanism.
There remains still one case where state can be flushed to local disk: if
a write to the remote backend fails during "terraform apply" then we will
still create the "errored.tfstate" file to allow the user to recover. This
seems like a reasonable compromise because this is done only in an
_exceptional_ case, and the console output makes it very clear that this
file has been created.
Fixes#15339.
Since bucket names must be *globally* unique. By including the project
ID in the bucket name we ensure that people don't step on each other's
feet when testing.
This calls backend.TestBackend() and remote.TestRemoteLocks() for
standardized acceptance tests. It removes custom listing tests since
those are performed by backend.TestBackend(), too.
Since each tests uses its own bucket, all tests can be run in parallel.
This resurrects the previously documented but unused "project" option.
This option is required to create buckets (so they are associated with the
right cloud project) but not to access the buckets later on (because their
names are globally unique).
The code is loosely based on state/remote/gcs_test.go. If the
GOOGLE_PROJECT environment variable is set, this test will
1) create a new bucket; error out if the bucket already exists.
2) create a new state
3) list states and ensure that the newly created state is listed
4) ensure that an object with the expected name exists
5) rum "state/remote".TestClient()
6) delete the state
The bucket is deleted when the test exits, though this may fail if the
bucket it not empty.
This config option was used by the legacy "gcs" client. If set, we're
using it for the default state -- all other states still use the
"state_dir" setting.
Calling context.Background() from outside the main() function is
discouraged. The configure functions are only called from
"…/helper/schema".Backend.Configure which provides the Background context,
i.e. a long-living context we can use for backend communication.
While #16243 added the ability to retry getting a state from S3, Put can
return the same InternalError status. Use the same retry logic when
uploading state to S3.
Add a way to inject network errors by setting an immediate deadline on
open consul connections. The consul client currently doesn't retry on
some errors, and will force us to lose our lock.
Once the consul api client is fixed, this test will fail.
The consul Client is analogous to an http.Client, and we really don't
need more than 1. Configure a single client and store it in the backend.
Replace the default Transport's Dialer to reduce the KeepAlive setting
from 30s to 17s. This avoids racing with the common network timeout
value of 30s, and is also coprime to other common intervals.
Internal errors from S3 are usually transient, and can be immediately retried.
Make 2 attempts at retreiving the state object before returning an error.
In #15884 we adjusted the plan output to give an explicit command to run
to apply a plan, whereas before this command was just alluded to in the
prose.
Since releasing that, we've got good feedback that it's confusing to
include such instructions when Terraform is running in a workflow
automation tool, because such tools usually abstract away exactly what
commands are run and require users to take different actions to
proceed through the workflow.
To accommodate such environments while retaining helpful messages for
normal CLI usage, here we introduce a new environment variable
TF_IN_AUTOMATION which, when set to a non-empty value, is a hint to
Terraform that it isn't being run in an interactive command shell and
it should thus tone down the "next steps" messaging.
The documentation for this setting is included as part of the "...in
automation" guide since it's not generally useful in other cases. We also
intentionally disclaim comprehensive support for this since we want to
avoid creating an extreme number of "if running in automation..."
codepaths that would increase the testing matrix and hurt maintainability.
The focus is specifically on the output of the three commands we give in
the automation guide, which at present means the following two situations:
* "terraform init" does not include the final paragraphs that suggest
running "terraform plan" and tell you in what situations you might need
to re-run "terraform init".
* "terraform plan" does not include the final paragraphs that either
warn about not specifying "-out=..." or instruct to run
"terraform apply" with the generated plan file.