Commit Graph

23 Commits

Author SHA1 Message Date
Nate Brown 9ff73cb02f
Increase the timestamp resolution for handshakes (#453) 2021-05-03 14:10:00 -05:00
Wade Simmons 44cb697552
Add more metrics (#450)
* Add more metrics

This change adds the following counter metrics:

Metrics to track packets dropped at the firewall:

    firewall.dropped.local_ip
    firewall.dropped.remote_ip
    firewall.dropped.no_rule

Metrics to track handshakes attempts that have been initiated and ones
that have timed out (ones that have completed are tracked by the
existing "handshakes" histogram).

    handshake_manager.initiated
    handshake_manager.timed_out

Metrics to track when cached_packets are dropped because we run out of
buffer space, and how many are sent once the handshake completes.

    hostinfo.cached_packets.dropped
    hostinfo.cached_packets.sent

This change also notes how many cached packets we have when we log the
final "Handshake received" message for either stage1 for stage2.

* separate incoming/outgoing metrics

* remove "allowed" firewall metrics

We don't need this on the hotpath, they aren't worh it.

* don't need pointers here
2021-04-27 22:23:18 -04:00
Nathan Brown db23fdf9bc
Dont apply race avoidance to existing handshakes, use the handshake time to determine who wins (#451)
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
2021-04-27 21:15:34 -05:00
Nathan Brown 710df6a876
Refactor remotes and handshaking to give every address a fair shot (#437) 2021-04-14 13:50:09 -05:00
Nathan Brown e7e55618ff
Include bad backets in the good handshake test (#428) 2021-03-31 13:36:10 -05:00
Nathan Brown 0c2e5973e1
Simple lie test (#427) 2021-03-31 10:26:35 -05:00
Nathan Brown 883e09a392
Don't use a global ca pool (#426) 2021-03-29 12:10:19 -05:00
Nathan Brown 3ea7e1b75f
Don't use a global logger (#423) 2021-03-26 09:46:30 -05:00
Nathan Brown 7073d204a8
IPv6 support for outside (udp) (#369) 2021-03-18 20:37:24 -05:00
Ryan Huber 3aaaea6309
don't allow a useless handshake with yourself (#402)
* don't allow a useless handshake with yourself

* remove helper
2021-03-15 12:58:23 -07:00
Wade Simmons 5506da3de9
Fix selection of UDP remote to use during stage2 (#404)
The change for #401 incorrectly called HostInfo.ForcePromoteBest in
stage2, when we really we want to pick the remote that we received the
response from.
2021-03-12 21:43:24 -05:00
Wade Simmons 6c55d67f18
Refactor handshake_ix (#401)
There are some subtle race conditions with the previous handshake_ix implementation, mostly around collisions with localIndexId. This change refactors it so that we have a "commit" phase during the handshake where we grab the lock for the hostmap and ensure that we have a unique local index before storing it. We also now avoid using the pending hostmap at all for receiving stage1 packets, since we have everything we need to just store the completed handshake.

Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Co-authored-by: Ryan Huber <rhuber@gmail.com>
Co-authored-by: forfuncsake <drussell@slack-corp.com>
2021-03-12 14:16:25 -05:00
Wade Simmons d604270966
Fix most known data races (#396)
This change fixes all of the known data races that `make smoke-docker-race` finds, except for one.

Most of these races are around the handshake phase for a hostinfo, so we add a RWLock to the hostinfo and Lock during each of the handshake stages.

Some of the other races are around consistently using `atomic` around the `messageCounter` field. To make this harder to mess up, I have renamed the field to `atomicMessageCounter` (I also removed the unnecessary extra pointer deference as we can just point directly to the struct field).

The last remaining data race is around reading `ConnectionInfo.ready`, which is a boolean that is only written to once when the handshake has finished. Due to it being in the hot path for packets and the rare case that this could actually be an issue, holding off on fixing that one for now.

here is the results of `make smoke-docker-race`:

before:

    lighthouse1: Found 2 data race(s)
    host2:       Found 36 data race(s)
    host3:       Found 17 data race(s)
    host4:       Found 31 data race(s)

after:

    host2: Found 1 data race(s)
    host4: Found 1 data race(s)

Fixes: #147
Fixes: #226
Fixes: #283
Fixes: #316
2021-03-05 21:18:33 -05:00
Wade Simmons ee7c27093c
add HostMap.RemoteIndexes (#329)
This change adds an index based on HostInfo.remoteIndexId. This allows
us to use HostMap.QueryReverseIndex without having to loop over all
entries in the map (this can be a bottleneck under high traffic
lighthouses).

Without this patch, a high traffic lighthouse server receiving recv_error
packets and lots of handshakes, cpu pprof trace can look like this:

      flat  flat%   sum%        cum   cum%
    2000ms 32.26% 32.26%     3040ms 49.03%  github.com/slackhq/nebula.(*HostMap).QueryReverseIndex
     870ms 14.03% 46.29%     1060ms 17.10%  runtime.mapiternext

Which shows 50% of total cpu time is being spent in QueryReverseIndex.
2020-11-23 14:51:16 -05:00
Wade Simmons 0389596f66
don't mark handshake packets as "lost" (#331)
Packet 1 is always a stage 1 handshake and packet 2 is always stage 2.
Normal packets don't start flowing until the message counter is 3 or
higher.

Currently we only receive either packet 1 or 2 depending on if
we are the initiator or responder for the handshake, so we end up
marking one of these as "lost". We should mark these packets as "seen"
when we are the one sending them, since we don't expect to see them from
the other side.
2020-11-16 14:03:08 -05:00
Alan Lam 5545cff6ef
log remote certificate fingerprint on handshakes (#262) 2020-07-31 18:54:51 -04:00
Wade Simmons aba42f9fa6
enforce the use of goimports (#248)
* enforce the use of goimports

Instead of enforcing `gofmt`, enforce `goimports`, which also asserts
a separate section for non-builtin packages.

* run `goimports` everywhere

* exclude generated .pb.go files
2020-06-30 18:53:30 -04:00
Wade Simmons b37a91cfbc
add meta packet statistics (#230)
This change add more metrics around "meta" (non "message" type packets).
For lighthouse packets, we also record statistics around the specific
lighthouse meta type.

We don't keep statistics for the "message" type so that we don't slow
down the fast path (and you can just look at metrics on the tun
interface to find that information).
2020-06-26 13:45:48 -04:00
Wade Simmons b4f2f7ce4e
log `certName` alongside `vpnIp` (#200)
This change adds a new helper, `(*HostInfo).logger()`, that starts a new
logrus.Entry with `vpnIp` and `certName`. We don't use the helper inside
of handshake_ix though since the certificate has not been attached to
the HostInfo yet.

Fixes: #84
2020-04-06 11:34:00 -07:00
Ryan Huber 9333a8e3b7 subnet support 2019-12-12 16:34:17 +00:00
Ryan Huber 89f0d998cf remove old debug print statements 2019-11-23 17:01:10 +00:00
Ryan Huber 6a460ba38b remove old hmac function. superceded by ix_psk0 2019-11-23 16:50:36 +00:00
Slack Security Team f22b4b584d Public Release 2019-11-19 17:00:20 +00:00