The context used for Stop is more appropriately tied to the lifetime of
the provisioner rather than a call to the ProvisionResource method. In
some cases Stop can be called before ProvisionResource, causing a panic
the provisioner. Rather than adding nil checks to the CancelFunc call
for Stop, create a base context to use for cancellation with both Stop
and Close methods.
Add a test to remote-exec to make sure the proper timeout is honored
during apply.
TODO: we need some test helpers for provisioners, so they can all be
verified.
The timeout for a provisioner is expected to only apply to the initial
connection. Keep the context for the communicator.Retry separate from
the global cancellation context.
Combine the ExitStatus and Err values from remote.Cmd into an error
returned by Wait, better matching the behavior of the os/exec package.
Non-zero exit codes are returned from Wait as a remote.ExitError.
Communicator related errors are returned directly.
Clean up all the error handling in the provisioners using a
communicator. Also remove the extra copyOutput synchronization that was
copied from package to package.
The timeout for the remote command was taken from the wrong config
field, and the connection timeout was being used which is 5 min. Any
remote command taking more than 5 min would be terminated by
disconnecting the communicator. Remove the timeout from the context, and
rely on the global timeout provided by terraform.
There was no way to get the error from the communicator previously, so
the broken connection was silently ignored and the provisioner returned
successfully. Now we can use the new cmd.Err() method to retrieve any
errors encountered during execution.
There no reason to retry around the execution of remote scripts. We've
already established a connection, so the only that could happen here is
to continually retry uploading or executing a script that can't succeed.
This also simplifies the streaming output from the command, which
doesn't need such explicit synchronization. Closing the output pipes is
sufficient to stop the copyOutput functions, and they don't close around
any values that are accessed again after the command executes.
The tests did pass, but that was because they only tested part of the changes. By using the `schema.TestResourceDataRaw` function the schema and config are better tested and so they pointed out a problem with the schema of the Chef provisioner.
The `Elem` fields did not have a `*schema.Schema` but a `schema.Schema` and in an `Elem` schema only the `Type` field may (and must) be set. Any other fields like `Optional` are not allowed here.
Next to fixing that problem I also did a little refactoring and cleaning up. Mainly making the `ProvisionerS` private (`provisioner`) and removing the deprecated fields.
1. Migrate `chef` provisioner to `schema.Provisioner`:
* `chef.Provisioner` structure was renamed to `ProvisionerS`and now it's decoded from `schema.ResourceData` instead of `terraform.ResourceConfig` using simple copy-paste-based solution;
* Added simple schema without any validation yet.
2. Support `ValidateFunc` validate function : implemented in `file` and `chef` provisioners.
Storing error values to atomic.Value may fail if they have different
dynamic types. Wrap error value in a consistent struct type to avoid
panics.
Make sure we return a nil error on success
The provisioner collected all inline commands into a single script which meant
only the exit code of the last command was actually checked for an error.
Fixes#10463
I'm really surprised this flew under the radar for years...
By having unique PRNGs, the SSH communicator could and would
generate identical ScriptPaths and two provisioners running in parallel
could overwrite each other and execute the same script. This would
happen because they're both seeded by the current time which could
potentially be identical if done in parallel...
Instead, we share the rand now so that the sequence is guaranteed
unique. As an extra measure of robustness, we also multiple by the PID
so that we're also protected against two processes at the same time.
The script cleanup step added in #5577 was positioned before the
`cmd.Wait()` call to ensure the command completes. This was causing
non-deterministic failures, especially for longer running scripts.
Fixes#5699Fixes#5737
* We now return an error when you set the script_path to
C:\Windows\Temp explaining this is currently not supported
* The fix in PR #1588 is converted to the updated setup in this PR
including the unit tests
Last thing to do is add a few tests for the WinRM communicator…
The reason why the shebang is removed from these tests, is because the
shebang is only needed for SSH/Linux connections. So in the new setup
the shebang line is added in the SSH communicator instead of in the
resource provisioner itself…
This is needed as preperation for adding WinRM support. There is still
one error in the tests which needs another look, but other than that it
seems like were now ready to start working on the WinRM part…