Change the `RetryFunc` from a plain `error` return type to a
specialized `RetryError` which must decide whether it is
retryable or not.
Add `RetryableError` / `NonRetryableError` factory functions that
callers are meant to use to build up these errors.
This makes it eminently clear whether or not a given error is
retryable from inside the client code.
Goal here is to _not_ change any behavior, simply reflect the
existing behavior with the new, clearer, API.
In #4700 while fixing a data race I made an incorrect assumption about
the return value of StateChangeConf, and ended up changing the behavior
in the timeout case to always return:
> timeout while waiting for state to become '[success]'
When it used to capture the "most recent error" from the function
itself.
It's much more useful to see that error bubbling up, so here we revert
to pulling it out of the function as we did before, and we protect
against the data race with a good old fashioned mutex.
Helpful when iterating on a drift test.
Eventually I think this assertion could be fanned out to something much
more targeted like:
ExpectAttributeDiff(resource, attr, oldval, newval)
But this is a step in the right direction.
* MaxItems defines a maximum amount of items that can exist within a
TypeSet or TypeList. Specific use cases would be if a TypeSet is being
used to wrap a complex structure, however more than one instance would
cause instability.
It was a mistake to switched fully to `==` when activating waiting for
capacity on updates in #3947. Users that didn't set `min_elb_capacity ==
desired_capacity` and instead treated it as an actual "minimum" would
see timeouts for every create, since their target numbers would never be
reached exactly.
Here, we fix that regression by restoring the minimum waiting behavior
during creates.
In order to preserve all the stated behavior, I had to split out
different criteria for create and update, criteria which are now
exhaustively unit tested.
The set of fields that affect capacity waiting behavior has become a bit
of a mess. Next major release I'd like to rework all of these into a
more consistently named block of config. For now, just getting the
behavior correct and documented.
(Also removes all the fixed names from the ASG tests as I was hitting
collision issues running them over here.)
Fixes#4792
Implementation notes:
* The hash implementation was not considering key value, causing "diffs
did not match" errors when a value was updated. Switching to default
HashResource implementation fixes this
* Using HashResource as a default exposed a bug in helper/schema that
needed to be fixed so the Set function is picked up properly during
Read
* Stop writing back values into the `key` attribute; it triggers extra
diffs when `default` is used. Computed values all just go into `var`.
* Includes a state migration to prevent unnecessary diffs based on
"key" attribute hashcodes changing.
In the tests:
* Switch from leaning on the public demo Consul instance to requiring a
CONSUL_HTTP_ADDR variable be set pointing to a `consul agent -dev`
instance to be used only for testing.
* Add a test that exposes the updating issues and covers the fixes
Fixes#774Fixes#1866Fixes#3023
Adds the `TF_SKIP_REMOTE_TESTS` env var to be used in cases where the
`http.Get()` smoke test passes but the network is not able to service
the needs of the tests.
Fixes#4421
The implementation was attempting to capture the error using a scoped
variable reference, but the error value is already exposed via the
return value of `WaitForState()`. Using that instead fixes the data
race.
Fixes the following data race:
```
==================
WARNING: DATA RACE
Read by goroutine 74:
github.com/hashicorp/terraform/helper/resource.Retry()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/wait.go:35 +0x284
github.com/hashicorp/terraform/helper/resource.TestRetry_timeout()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/wait_test.go:35 +0x60
testing.tRunner()
/private/var/folders/vd/7l9ys5k57l91x63sh28wl_kc0000gn/T/workdir/go/src/testing/testing.go:456 +0xdc
Previous write by goroutine 90:
github.com/hashicorp/terraform/helper/resource.Retry.func1()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/wait.go:20 +0x87
github.com/hashicorp/terraform/helper/resource.(*StateChangeConf).WaitForState.func1()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/state.go:83 +0x284
Goroutine 74 (running) created at:
testing.RunTests()
/private/var/folders/vd/7l9ys5k57l91x63sh28wl_kc0000gn/T/workdir/go/src/testing/testing.go:561 +0xaa3
testing.(*M).Run()
/private/var/folders/vd/7l9ys5k57l91x63sh28wl_kc0000gn/T/workdir/go/src/testing/testing.go:494 +0xe4
main.main()
github.com/hashicorp/terraform/helper/resource/_test/_testmain.go:84 +0x20f
Goroutine 90 (running) created at:
github.com/hashicorp/terraform/helper/resource.(*StateChangeConf).WaitForState()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/state.go:127 +0x283
github.com/hashicorp/terraform/helper/resource.Retry()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/wait.go:34 +0x276
github.com/hashicorp/terraform/helper/resource.TestRetry_timeout()
/Users/phinze/go/src/github.com/hashicorp/terraform/helper/resource/wait_test.go:35 +0x60
testing.tRunner()
/private/var/folders/vd/7l9ys5k57l91x63sh28wl_kc0000gn/T/workdir/go/src/testing/testing.go:456 +0xdc
==================
```
Generate bucket names and object names per test instead of once at the
top level. Should help avoid failures like this one:
https://travis-ci.org/hashicorp/terraform/jobs/100254008
All storage tests checked on this commit:
```
TF_ACC=1 go test -v ./builtin/providers/google -run TestAccGoogleStorage
=== RUN TestAccGoogleStorageBucketAcl_basic
--- PASS: TestAccGoogleStorageBucketAcl_basic (8.90s)
=== RUN TestAccGoogleStorageBucketAcl_upgrade
--- PASS: TestAccGoogleStorageBucketAcl_upgrade (14.18s)
=== RUN TestAccGoogleStorageBucketAcl_downgrade
--- PASS: TestAccGoogleStorageBucketAcl_downgrade (12.83s)
=== RUN TestAccGoogleStorageBucketAcl_predefined
--- PASS: TestAccGoogleStorageBucketAcl_predefined (4.51s)
=== RUN TestAccGoogleStorageObject_basic
--- PASS: TestAccGoogleStorageObject_basic (3.77s)
=== RUN TestAccGoogleStorageObjectAcl_basic
--- PASS: TestAccGoogleStorageObjectAcl_basic (4.85s)
=== RUN TestAccGoogleStorageObjectAcl_upgrade
--- PASS: TestAccGoogleStorageObjectAcl_upgrade (7.68s)
=== RUN TestAccGoogleStorageObjectAcl_downgrade
--- PASS: TestAccGoogleStorageObjectAcl_downgrade (7.37s)
=== RUN TestAccGoogleStorageObjectAcl_predefined
--- PASS: TestAccGoogleStorageObjectAcl_predefined (4.16s)
PASS
ok github.com/hashicorp/terraform/builtin/providers/google 68.275s
```
* Add SSH Keys to all droplets in tests, this prevents acctests from
spamming account owner email with root password details
* Add a new helper/acctest package to be a home for random string / int
implementations used in tests.
* Insert some random details into record tests to prevent collisions
* Normalize config style in tests to hclfmt conventions
In the acceptance testing framework, it is neccessary to provide a copy
of the state _before_ the destroy is applied to the check in order that
it can loop over resources to verify their destruction. This patch makes
a deep copy of the state prior to applying test steps which have the
Destroy option set and then passes that to the destroy check.
We weren't doing any log setup for acceptance tests, which made it
difficult to wrangle log output in CI.
This moves the log setup functions we use in `main` over into a helper
package so we can use them for acceptance tests as well.
This means that acceptance tests will by default be a _lot_ quieter,
only printing out actual test output. Setting `TF_LOG=trace` will
restore the full prior noise level.
Only minor behavior change is to make `ioutil.Discard` the default
return value rather than a `nil` that needs to be checked for.
Changing the Set internals makes a lot of sense as it saves doing
conversions in multiple places and gives a central place to alter
the key when a item is computed.
This will have no side effects other then that the ordering is now
based on strings instead on integers, so the order will be different.
This will however have no effect on existing configs as these will
use the individual codes/keys and not the ordering to determine if
there is a diff or not.
Lastly (but I think also most importantly) there is a fix in this PR
that makes diffing sets extremely more performand. Before a full diff
required reading the complete Set for every single parameter/attribute
you wanted to diff, while now it only gets that specific parameter.
We have a use case where we have a Set that has 18 parameters and the
set consist of about 600 items (don't ask 😉). So when doing a diff
it would take 100% CPU of all cores and stay that way for almost an
hour before being able to complete the diff.
Debugging this we learned that for retrieving every single parameter
it made over 52.000 calls to `func (c *ResourceConfig) get(..)`. In
this function a slice is created and used only for the duration of the
call, so the time needed to create all needed slices and on the other
hand the time the garbage collector needed to clean them up again caused
the system to cripple itself. Next to that there are also some expensive
reflect calls in this function which also claimed a fair amount of CPU
time.
After this fix the number of calls needed to get a single parameter
dropped from 52.000+ to only 2! 😃
I promised myself that next time I jumped in this file I'd fix this up.
Now we don't have to manually index the file with comments, we can just
add descriptive names to the test cases!
Because `aws_security_group_rule` resources are an abstraction on top of
Security Groups, they must interact with the AWS Security Group APIs in
a pattern that often results in lots of parallel requests interacting
with the same security group.
We've found that this pattern can trigger race conditions resulting in
inconsistent behavior, including:
* Rules that report as created but don't actually exist on AWS's side
* Rules that show up in AWS but don't register as being created
locally, resulting in follow up attempts to authorize the rule
failing w/ Duplicate errors
Here, we introduce a per-SG mutex that must be held by any security
group before it is allowed to interact with AWS APIs. This protects the
space between `DescribeSecurityGroup` and `Authorize*` / `Revoke*`
calls, ensuring that no other rules interact with the SG during that
span.
The included test exposes the race by applying a security group with
lots of rules, which based on the dependency graph can all be handled in
parallel. This fails most of the time without the new locking behavior.
I've omitted the mutex from `Read`, since it is only called during the
Refresh walk when no changes are being made, meaning a bunch of parallel
`DescribeSecurityGroup` API calls should be consistent in that case.
We've been moving away from config fields expecting file paths that
Terraform will load, instead prefering fields that expect file contents,
leaning on `file()` to do loading from a path.
This helps with consistency and also flexibility - since this makes it
easier to shift sensitive files into environment variables.
Here we add a little helper package to manage the transitional period
for these fields where we support both behaviors.
Also included is the first of several fields being shifted over - SSH
private keys in provisioner connection config.
We're moving to new field names so the behavior is more intuitive, so
instead of `key_file` it's `private_key` now.
Additional field shifts will be included in follow up PRs so they can be
reviewed and discussed individually.
Previous this would return the following sort of error:
expected type 'string', got unconvertible type '[]interface {}'
This is the raw error returned by the underlying mapstructure library.
This is not a helpful error message for anyone who doesn't know Go's
type system, and it exposes Terraform's internals to the UI.
Instead we'll catch these cases before we try to use mapstructure and
return a more straightforward message.
By checking the type before the IsComputed exception this also avoids
a crash caused when the assigned value is a computed list. Otherwise
the list of interpolations is allowed through here and then crashes later
during Diff when the value is not a primitive as expected.
A common issue with new resource implementations is not considering parts
of a complex structure that's used inside a set, which causes quirky
behavior.
The schema helper has enough information to provide a default reasonable
implementation of a set function that includes all non-computed attributes
in a deterministic way. Here we implement such a function and use it
when no explicit hashing function is provided.
In order to achieve this we encapsulate the construction of the zero
value for a schema in a new method schema.ZeroValue, which allows us to
put the fallback logic to the new default function in a single spot.
It is no longer valid to use &Set{F: schema.Set} and all uses of that
construct should be replaced with schema.ZeroValue().(*Set) .
It seems there are 4 locations left that use the `helper/multierror`
package, where the rest is TF settled on the `hashicorp/go-multierror`
package.
Functionally this doesn’t change anything, so I suggest to delete the
builtin version as it can only cause confusion (both packages have the
same name, but are still different types according to Go’s type system.
We need to set the value to an empty value so the state file does
indeed change the value. Otherwise the obsolote value is still
intact and doesn't get changed at all. This means `terraform show`
still shows the obsolote value when the particular value is not
existing anymore. This is due the AWS API which is returning a null
instead of an empty string.
This was just a missed exit from the resource.Apply function -
subsequent refreshes would add the SchemaVersion back into the state,
but having the state recorded once without the meta information can
cause problems with Atlas's remote state checksumming.
By prefixing them with `cmd /c` it will work with both `winner` and
`ssh` connection types.
This PR also reverts some bad stringer changes made in PR #2673
In `helper/schema` we already makes a distinction between `Default`
which is always applied and `InputDefault` which is displayed to the
user for an empty field.
But for variables we just have `Default` which is treated like
`InputDefault`. This changes it to _not_ prompt the user for a value
when the variable declaration includes a default.
Treating this as a UX bugfix and the "don't prompt for variables w/
defaults set" behavior as the originally expected behavior we were
failing to honor.
Added an already-passing test to verify and cover the `helper/schema`
behavior.
Perhaps down the road we can add a `input_default` attribute to
variables to allow similar behavior to `helper/schema` in variables, but
for now just sticking with the fix.
Fixes#2592
This is the initial pure "all tests passing without a diff" stage. The
plan is to change the internal representation of StringList to include a
suffix delimiter, which will allow us to recognize empty and
single-element lists.
The Exists function can run in a context where the contents of the
template have changed, but it uses the old set of variables from the
state. This means that when the set of variables changes, rendering will
fail in Exists. This was returning an error, but really it just needs to
be treated as a scenario where the template needs re-rendering.
fixes#2344 and possibly a few other template issues floating around
I couldn't see a simple path get this working for Maps, Sets,
and Lists, so lets land it as a primitive-only schema feature.
I think validation on primitives comprises 80% of the use cases anyways.
Guarantees that the `interface{}` arg to ValidateFunc is the proper
type, allowing implementations to be simpler.
Finish the docstring on `ValidateFunc` to call this out.
/cc @mitchellh
The runtime impl of ConfictsWith uses Resource.Get(), which makes it
work with any other attribute of the resource - the InternalValidate was
only checking against the local schemaMap though, preventing subResource
from using ConflictsWith properly.
It's a lot of wiring and it's a bit ugly, but it's not runtime code, so
I'm a bit less concerned about that aspect.
This should take care of the problem mentioned in #1909
This reworks the template lifecycle a bit such that we get nicer diff
behavior.
First, we tick ForceNew on for both filename and vars, so that the diff
indicates that the template will be "replaced" on change. This is mostly
cosmetic, but it also tracks conceptually with the fact that the
identifier we use is a hash of the contents, so any change essentially
makes a "new resource".
Second, we change the Exists implementation to only return `false` when
there has been a change in the rendered template. This lets descendent
resources see the computed value changing so that they'll properly
trigger in the plan.
Fixes#1898
Refs #1866 (but does not fix, there's another deeper issue there)
Other than the fact that "The the" doesn't really make any sense anywhere
that it's used in Terraform, they're a post-punk band from the UK.
Fixes "The The" so that they can get back to playing songs.
AccTests like TestAccComputeInstance_basic_deprecated_network were
failing early on "invalid config" when we are explictly testing behavior
that we know generates warnings.
This fixes some perpetual diffs I saw in Atlas AccTests where an empty
map (`map[string]interface{}{}`) was being `d.Set` for "metadata_full".
Because the MapFieldWriter was not distinguishing between empty and nil,
this trigger the "map delete" logic and no count was written to the
state. This caused subsequent plans to improperly report a diff.
Here we redefine the map delete functionality to explicitly trigger only
on `nil`, so we catch the `.#` field for empty maps.
- Users
- Groups
- Roles
- Inline policies for the above three
- Instance profiles
- Managed policies
- Access keys
This is most of the data types provided by IAM. There are a few things
missing, but the functionality here is probably sufficient for 95% of
the cases. Makes a dent in #28.
Do directory expansion on filenames.
Add basic acceptance tests. Code coverage is 72.5%.
Uncovered code is uninteresting and/or impossible error cases.
Note that this required adding a knob to
helper/resource.TestStep to allow transient
resources.
This is needed as preperation for adding WinRM support. There is still
one error in the tests which needs another look, but other than that it
seems like were now ready to start working on the WinRM part…
I forgot to add `Computed: true` when I made the "key_name" field
optional in #1751.
This made the behavior:
* Name generated in Create and set as ID
* Follow up plan (without refresh) was nice and empty
* During refresh, name gets cleared out on Read, causing a bad diff on
subsequent plans
We can automatically catch bugs like this if we add yet another
verification step to our resource acceptance tests -> a post
Refresh+Plan that we verify is empty.
I left the non-refresh Plan verification in, because it's important that
_both_ of these are empty after an Apply.