The context used for Stop is more appropriately tied to the lifetime of
the provisioner rather than a call to the ProvisionResource method. In
some cases Stop can be called before ProvisionResource, causing a panic
the provisioner. Rather than adding nil checks to the CancelFunc call
for Stop, create a base context to use for cancellation with both Stop
and Close methods.
The timeout for a provisioner is expected to only apply to the initial
connection. Keep the context for the communicator.Retry separate from
the global cancellation context.
Combine the ExitStatus and Err values from remote.Cmd into an error
returned by Wait, better matching the behavior of the os/exec package.
Non-zero exit codes are returned from Wait as a remote.ExitError.
Communicator related errors are returned directly.
Clean up all the error handling in the provisioners using a
communicator. Also remove the extra copyOutput synchronization that was
copied from package to package.
The timeout for the remote command was taken from the wrong config
field, and the connection timeout was being used which is 5 min. Any
remote command taking more than 5 min would be terminated by
disconnecting the communicator. Remove the timeout from the context, and
rely on the global timeout provided by terraform.
There was no way to get the error from the communicator previously, so
the broken connection was silently ignored and the provisioner returned
successfully. Now we can use the new cmd.Err() method to retrieve any
errors encountered during execution.
There no reason to retry around the execution of remote scripts. We've
already established a connection, so the only that could happen here is
to continually retry uploading or executing a script that can't succeed.
This also simplifies the streaming output from the command, which
doesn't need such explicit synchronization. Closing the output pipes is
sufficient to stop the copyOutput functions, and they don't close around
any values that are accessed again after the command executes.
Storing error values to atomic.Value may fail if they have different
dynamic types. Wrap error value in a consistent struct type to avoid
panics.
Make sure we return a nil error on success
The provisioner collected all inline commands into a single script which meant
only the exit code of the last command was actually checked for an error.
Fixes#10463
I'm really surprised this flew under the radar for years...
By having unique PRNGs, the SSH communicator could and would
generate identical ScriptPaths and two provisioners running in parallel
could overwrite each other and execute the same script. This would
happen because they're both seeded by the current time which could
potentially be identical if done in parallel...
Instead, we share the rand now so that the sequence is guaranteed
unique. As an extra measure of robustness, we also multiple by the PID
so that we're also protected against two processes at the same time.
The script cleanup step added in #5577 was positioned before the
`cmd.Wait()` call to ensure the command completes. This was causing
non-deterministic failures, especially for longer running scripts.
Fixes#5699Fixes#5737
* We now return an error when you set the script_path to
C:\Windows\Temp explaining this is currently not supported
* The fix in PR #1588 is converted to the updated setup in this PR
including the unit tests
Last thing to do is add a few tests for the WinRM communicator…
This is needed as preperation for adding WinRM support. There is still
one error in the tests which needs another look, but other than that it
seems like were now ready to start working on the WinRM part…