The API surface area is much smaller when we use the remote backend for remote state only.
So in order to try and prevent any backwards incompatibilities when TF runs inside of TFE, we’ve split up the discovery services into `state.v2` (which can be used for remote state only configurations, so when running in TFE) and `tfe.v2.1` (which can be used for all remote configurations).
The AWS Go SDK automatically provides a default request retryer with exponential backoff that is invoked via setting `MaxRetries` or leaving it `nil` will default to 3. The terraform-aws-provider `config.Client()` sets `MaxRetries` to 0 unless explicitly configured above 0. Previously, we were not overriding this behavior by setting the configuration and therefore not invoking the default request retryer.
The default retryer already handles HTTP error codes above 500, including S3's InternalError response, so the extraneous handling can be removed. This will also start automatically retrying many additional cases, such as temporary networking issues or other retryable AWS service responses.
Changes:
* s3/backend: Add `max_retries` argument
* s3/backend: Enhance S3 NoSuchBucket error to include additional information
* Upgrading to 2.0.0 of github.com/hashicorp/go-azure-helpers
* Support for authenticating using Azure CLI
* backend/azurerm: support for authenticating using the Azure CLI
This change enables a few related use cases:
* AWS has partitions outside Commercial, GovCloud (US), and China, which are the only endpoints automatically handled by the AWS Go SDK. DynamoDB locking and credential verification can not currently be enabled in those regions.
* Allows usage of any DynamoDB-compatible API for state locking
* Allows usage of any IAM/STS-compatible API for credential verification
* backend/azurerm: removing the `arm_` prefix from keys
* removing the deprecated fields test because the deprecation makes it fail
* authentication: support for custom resource manager endpoints
* Adding debug prefixes to the log statements
* adding acceptance tests for msi auth
* including the resource group name in the tests
* backend/azurerm: support for authenticating using a SAS Token
* resolving merge conflicts
* moving the defer to prior to the error
* backend/azurerm: support for authenticating via msi
* adding acceptance tests for msi auth
* including the resource group name in the tests
* support for using the test client via msi
* vendor updates
- updating to v21.3.0 of github.com/Azure/azure-sdk-for-go
- updating to v10.15.4 of github.com/Azure/go-autorest
- vendoring github.com/hashicorp/go-azure-helpers @ 0.1.1
* backend/azurerm: refactoring to use the new auth package
- refactoring the backend to use a shared client via the new auth package
- adding tests covering both Service Principal and Access Key auth
- support for authenticating using a proxy
- rewriting the backend documentation to include examples of both authentication types
* switching to use the build-in logging function
* documenting it's also possible to retrieve the access key from an env var
The state manager refactoring in an earlier commit was reflected in the
implementations of these backends, but not in their tests. This gets us
back to a state where the backend tests will compile, and gets _most_ of
them passing again, with a few exceptions that will be addressed in a
subsequent commit.
Due to how often the state and plan types are referenced throughout
Terraform, there isn't a great way to switch them out gradually. As a
consequence, this huge commit gets us from the old world to a _compilable_
new world, but still has a large number of known test failures due to
key functionality being stubbed out.
The stubs here are for anything that interacts with providers, since we
now need to do the follow-up work to similarly replace the old
terraform.ResourceProvider interface with its replacement in the new
"providers" package. That work, along with work to fix the remaining
failing tests, will follow in subsequent commits.
The aim here was to replace all references to terraform.State and its
downstream types with states.State, terraform.Plan with plans.Plan,
state.State with statemgr.State, and switch to the new implementations of
the state and plan file formats. However, due to the number of times those
types are used, this also ended up affecting numerous other parts of core
such as terraform.Hook, the backend.Backend interface, and most of the CLI
commands.
Just as with 5861dbf3fc49b19587a31816eb06f511ab861bb4 before, I apologize
in advance to the person who inevitably just found this huge commit while
spelunking through the commit history.
The new config loader requires some steps to happen in a different
order, particularly in regard to knowing the schema in order to
decode the configuration.
Here we lean directly on the configschema package, rather than
on helper/schema.Backend as before, because it's generally
sufficient for our needs here and this prepares us for the
helper/schema package later moving out into its own repository
to seed a "plugin SDK".
This was already added to triton-go and is now making its way to
the manta backend
```
% acctests backend/remote-state/manta
=== RUN TestBackend_impl
--- PASS: TestBackend_impl (0.00s)
=== RUN TestBackend
--- PASS: TestBackend (27.36s)
=== RUN TestBackendLocked
--- PASS: TestBackendLocked (16.24s)
=== RUN TestRemoteClient_impl
--- PASS: TestRemoteClient_impl (0.00s)
=== RUN TestRemoteClient
--- PASS: TestRemoteClient (3.40s)
=== RUN TestRemoteClientLocks
--- PASS: TestRemoteClientLocks (7.17s)
PASS
ok github.com/hashicorp/terraform/backend/remote-state/manta
```
Fixes: #17314
We now deal correctly with the creation of the state file - we were
not dealing well with a ResourceNotFound error
Now that this has been changed around, we try and create the statefile
and if there is an error, we look for an existing statefile - previously
this was not the order of operations
Simplify the use of clistate.Lock by creating a clistate.Locker
instance, which stores the context of locking a state, to allow unlock
to be called without knowledge of how the state was locked.
This alows the backend code to bring the needed UI methods to the point
where the state is locked, and still unlock the state from an outer
scope.
Fix the now failing state unlock test by reporting the correct ID.
The ID used by GCS is the generation number of the info object, which
isn't known until the info is already written out. While we can't get
the correct ID from the info data for the error rmessage, we can update
it with the generation number after it's read.
Internally, triton-go has changed how it handles errors. We can now get rid of
checking strings for errors, and we have introduced an errors library that
wraps some of the major errors we encounter and test for
Triton Manta allows an account other than the main triton account to be used via RBAC.
Here we expose the SDC_USER / TRITON_USER options to the backend so that a user can be specified.
This creates a unique bucket name for each test, so that the tests in
parallel don't collide, and buckets left over from interrupted tests
don't cause future failures.
Also make sure that buckets are removed, regardless of content.
The backend was creating bucket named in the configuration if it didn't
exist. We don't allow other backends to do this, because these are not
managed resources that terraform can control.
Previously there was a problem with double-locking when using the GCS backend with the terraform_remote_state data source.
Here we adjust the locking methodology to avoid that problem.
This PR changes manta from being a legacy remote state client to a new backend type. This also includes creating a simple lock within manta
This PR also unifies the way the triton client is configured (the schema) and also uses the same env vars to set the backend up
It is important to note that if the remote state path does not exist, then the backend will create that path. This means the user doesn't need to fall into a chicken and egg situation of creating the directory in advance before interacting with it
Reuse the running consul server for all tests.
Update the lostLockConnection package, since the api client should no
longer lose a lock immediately on network errors.
This is from a commit just after the v1.0.0 release, because it removes
the Porter service dependency for tests. The client api package was not
changed.
Since bucket names must be *globally* unique. By including the project
ID in the bucket name we ensure that people don't step on each other's
feet when testing.
This calls backend.TestBackend() and remote.TestRemoteLocks() for
standardized acceptance tests. It removes custom listing tests since
those are performed by backend.TestBackend(), too.
Since each tests uses its own bucket, all tests can be run in parallel.
This resurrects the previously documented but unused "project" option.
This option is required to create buckets (so they are associated with the
right cloud project) but not to access the buckets later on (because their
names are globally unique).
The code is loosely based on state/remote/gcs_test.go. If the
GOOGLE_PROJECT environment variable is set, this test will
1) create a new bucket; error out if the bucket already exists.
2) create a new state
3) list states and ensure that the newly created state is listed
4) ensure that an object with the expected name exists
5) rum "state/remote".TestClient()
6) delete the state
The bucket is deleted when the test exits, though this may fail if the
bucket it not empty.
This config option was used by the legacy "gcs" client. If set, we're
using it for the default state -- all other states still use the
"state_dir" setting.
Calling context.Background() from outside the main() function is
discouraged. The configure functions are only called from
"…/helper/schema".Backend.Configure which provides the Background context,
i.e. a long-living context we can use for backend communication.
While #16243 added the ability to retry getting a state from S3, Put can
return the same InternalError status. Use the same retry logic when
uploading state to S3.
Add a way to inject network errors by setting an immediate deadline on
open consul connections. The consul client currently doesn't retry on
some errors, and will force us to lose our lock.
Once the consul api client is fixed, this test will fail.
The consul Client is analogous to an http.Client, and we really don't
need more than 1. Configure a single client and store it in the backend.
Replace the default Transport's Dialer to reduce the KeepAlive setting
from 30s to 17s. This avoids racing with the common network timeout
value of 30s, and is also coprime to other common intervals.
Internal errors from S3 are usually transient, and can be immediately retried.
Make 2 attempts at retreiving the state object before returning an error.
A TLS config was being assigned to a Transport in a nil http.Client. The
Transport is built in the consul config by default, but the http.Client
is not built until later in NewClient.
Added locking support via blob leasing (requires that an empty state is
created before any lock can be acquired.
Added support for "environments" in much the same way as the S3 backend.
S3 accepts objects with a leading slash and strips them off. This works
fine except in our workspace hierarchy, which then can no longer find
suffixes matching the full key name.
When remote backend imeplemtations create a new named state, they may
need to acquire a lock and/or save an actual empty state to the backend.
Copy this behavior in the inmem backend for testing.
Updated the vendored consul which no longer requires the channel adapter
to convert a `chan stuct{}` to a `<-chan struct{}`.
Call testutil.NewTestServerConfigT with the new signature.
When a consul lock is lost, there is a possibility that the associated
session is still active. Most commonly, the long request to watch the
lock key may error out, while the session is continually refreshed at a
rate of TTL/2.
First have the lock monitor retry the lock internally for at least 10
seconds (5 attempts with the default 2 second wait time). In most cases
this will reconnect on the first try, keeping the lock channel open.
If the consul lock can't recover itself, then cancel the session as soon
as possible (terminating the PreiodicRenew will call Session.Destroy),
and start over. In the worse case, the consul agents were split, and the
session still exists on the leader so we may need to wait for the old
session TTL, plus the LockWait time to renew the lock.
We use a Context for the cancellation channels here, because that
removes the need to worry about double-closes and nil channels. It
requires an awkward adapter goroutine for now to convert the Done()
`<-chan` to a `chan` for PeriodicRenew, but makes the rest of the code
safer in the long run.
The s3.Backend was using it's own code for DeleteState, but the dynamo
entries are only handled through the RemoteClient. Have DeleteState use
a RemoteClient for delete.
Move the Swift State from a legacy remote state to an official backend.
Add `container` and `archive_container` configuration variables, and deprecate `path` and `archive_path` variables.
Future improvements: Add support for locking and environments.
Since the DynamoDB table used by the S3 backend is no longer only used
for locks, rename it in the config to remove any confusion about it
being lock-specific.