If we receive a handshake packet for a tunnel that has already been
completed, check to see if the new remote is preferred. If so, update to
the preferred remote and send a test packet to influence the other side
to do the same.
As suggested in https://pkg.go.dev/golang.org/x/crypto/curve25519#ScalarBaseMult,
use X25519 instead of ScalarBaseMult. When using Basepoint, it may employ
some precomputed values, enhancing performance.
Co-authored-by: Wade Simmons <wade@wades.im>
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
`func (nc *NebulaCertificate) VerifyPrivateKey(key []byte) error` would
previously return an error even if passed the correct private key for a
CA certificate `nc`.
That function has been updated to support CA certificates, and
nebula-cert now calls it before signing a new certificate. Previously,
it would perform all constraint checks against the CA certificate
provided, take a SHA256 fingerprint of the provided certificate, insert
it into the new node certificate, and then finally sign it with the
mismatching private key provided.
Hi @nbrownus
Fixed a small bug that was introduced in
df7c7ee#diff-5d05d02296a1953fd5fbcb3f4ab486bc5f7c34b14c3bdedb068008ec8ff5beb4
having problems due to it
* Add more metrics
This change adds the following counter metrics:
Metrics to track packets dropped at the firewall:
firewall.dropped.local_ip
firewall.dropped.remote_ip
firewall.dropped.no_rule
Metrics to track handshakes attempts that have been initiated and ones
that have timed out (ones that have completed are tracked by the
existing "handshakes" histogram).
handshake_manager.initiated
handshake_manager.timed_out
Metrics to track when cached_packets are dropped because we run out of
buffer space, and how many are sent once the handshake completes.
hostinfo.cached_packets.dropped
hostinfo.cached_packets.sent
This change also notes how many cached packets we have when we log the
final "Handshake received" message for either stage1 for stage2.
* separate incoming/outgoing metrics
* remove "allowed" firewall metrics
We don't need this on the hotpath, they aren't worh it.
* don't need pointers here
This check was accidentally typo'd in #396 from `%` to `&`. Restore the
correct functionality here (we want to do the check every "PromoteEvery"
count packets).
This is how Prometheus recommends you do it, and how they do it
themselves in their client. This makes it easy to see which versions you
have deployed in your fleet, and query over it too.
Currently, if you use the remote allow list config, as soon as you attempt to create a tunnel to a node that has a blocked IP address, a mutex is locked and never unlocked. This happens even if the node has an allowed remote IP address in addition to the blocked remote IP address.
This pull request ensures that the lighthouse mutex is unlocked whenever we attempt to add a remote IP.
The change for #401 incorrectly called HostInfo.ForcePromoteBest in
stage2, when we really we want to pick the remote that we received the
response from.
There are some subtle race conditions with the previous handshake_ix implementation, mostly around collisions with localIndexId. This change refactors it so that we have a "commit" phase during the handshake where we grab the lock for the hostmap and ensure that we have a unique local index before storing it. We also now avoid using the pending hostmap at all for receiving stage1 packets, since we have everything we need to just store the completed handshake.
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Co-authored-by: Ryan Huber <rhuber@gmail.com>
Co-authored-by: forfuncsake <drussell@slack-corp.com>
We missed this race with #396 (and I think this is also the crash in
issue #226). We need to lock a little higher in the getOrHandshake
method, before we reset hostinfo.ConnectionInfo. Previously, two
routines could enter this section and confuse the handshake process.
This could result in the other side sending a recv_error that also has
a race with setting hostinfo.ConnectionInfo back to nil. So we make sure
to grab the lock in handleRecvError as well.
Neither of these code paths are in the hot path (handling packets
between two hosts over an active tunnel) so there should be no
performance concerns.
* Do not allow someone to run a nebula lighthouse with an ephemeral port
* derp - we discover the port so we have to check the config setting
* No context needed for this error
* gofmt yourself
* Revert "gofmt yourself"
This reverts commit c01423498e3792f7acd69d7e691dce1edad81bcb.
* Revert "No context needed for this error"
This reverts commit 6792af6846d1200c564a4ad601a637535dd56c5b.
* snip snap snip snap