* backend/remote-state/s3/backend_state.go: Prior to this commit, the terraform s3 backend did
not paginate calls to s3 when finding workspaces, which resulted in workspaces 'disappearing'
once they are switched away from, even though the state file still exists. This is due to the
ListBucket operation defaulting MaxItems to 1000, so terraform s3 backends that contained
more then 1000 workspaces did not function as expected. This rectifies this situation by
paginating calls to s3 when finding workspaces.
Signed-off-by: Collin J. Doering <collin@rekahsoft.ca>
This mirrors the change made for providers, so that default values can
be inserted into the config by the backend implementation. This is only
the interface and method name changes, it does not yet add any default
values.
The handling of slashes was broken around listing workspaces in
workspace_key_prefix. While it worked in most places by splitting an
extra time around the spurious slashes, it failed in the case that the
prefix ended with a slash of its own.
A test was temporarily added to verify that the backend works with the
unusual keys, but rather than risking silent breakage around prefixes
with trailing slashes, we also add validation to prevent users from
entering keys with trailing slashes at all.
The AWS Go SDK automatically provides a default request retryer with exponential backoff that is invoked via setting `MaxRetries` or leaving it `nil` will default to 3. The terraform-aws-provider `config.Client()` sets `MaxRetries` to 0 unless explicitly configured above 0. Previously, we were not overriding this behavior by setting the configuration and therefore not invoking the default request retryer.
The default retryer already handles HTTP error codes above 500, including S3's InternalError response, so the extraneous handling can be removed. This will also start automatically retrying many additional cases, such as temporary networking issues or other retryable AWS service responses.
Changes:
* s3/backend: Add `max_retries` argument
* s3/backend: Enhance S3 NoSuchBucket error to include additional information
This change enables a few related use cases:
* AWS has partitions outside Commercial, GovCloud (US), and China, which are the only endpoints automatically handled by the AWS Go SDK. DynamoDB locking and credential verification can not currently be enabled in those regions.
* Allows usage of any DynamoDB-compatible API for state locking
* Allows usage of any IAM/STS-compatible API for credential verification
The state manager refactoring in an earlier commit was reflected in the
implementations of these backends, but not in their tests. This gets us
back to a state where the backend tests will compile, and gets _most_ of
them passing again, with a few exceptions that will be addressed in a
subsequent commit.
Due to how often the state and plan types are referenced throughout
Terraform, there isn't a great way to switch them out gradually. As a
consequence, this huge commit gets us from the old world to a _compilable_
new world, but still has a large number of known test failures due to
key functionality being stubbed out.
The stubs here are for anything that interacts with providers, since we
now need to do the follow-up work to similarly replace the old
terraform.ResourceProvider interface with its replacement in the new
"providers" package. That work, along with work to fix the remaining
failing tests, will follow in subsequent commits.
The aim here was to replace all references to terraform.State and its
downstream types with states.State, terraform.Plan with plans.Plan,
state.State with statemgr.State, and switch to the new implementations of
the state and plan file formats. However, due to the number of times those
types are used, this also ended up affecting numerous other parts of core
such as terraform.Hook, the backend.Backend interface, and most of the CLI
commands.
Just as with 5861dbf3fc49b19587a31816eb06f511ab861bb4 before, I apologize
in advance to the person who inevitably just found this huge commit while
spelunking through the commit history.
The new config loader requires some steps to happen in a different
order, particularly in regard to knowing the schema in order to
decode the configuration.
Here we lean directly on the configschema package, rather than
on helper/schema.Backend as before, because it's generally
sufficient for our needs here and this prepares us for the
helper/schema package later moving out into its own repository
to seed a "plugin SDK".
While #16243 added the ability to retry getting a state from S3, Put can
return the same InternalError status. Use the same retry logic when
uploading state to S3.
Internal errors from S3 are usually transient, and can be immediately retried.
Make 2 attempts at retreiving the state object before returning an error.
S3 accepts objects with a leading slash and strips them off. This works
fine except in our workspace hierarchy, which then can no longer find
suffixes matching the full key name.
The s3.Backend was using it's own code for DeleteState, but the dynamo
entries are only handled through the RemoteClient. Have DeleteState use
a RemoteClient for delete.
Since the DynamoDB table used by the S3 backend is no longer only used
for locks, rename it in the config to remove any confusion about it
being lock-specific.
The S3 client can return (nil, nil) when the remote state doesn't exist.
The caused a nil pointer dereference when checking the payload.MD5
against the expected value.
This can happen if the remote state was manually removed, but the digest
entry was left in the DynamoDB table.