nebula

Commit Graph

Author	SHA1	Message	Date
Wade Simmons	1bae5b2550	more validation in pending hostmap deletes (#344 ) We are currently seeing some cases where we are not deleting entries correctly from the pending hostmap. I believe this is a case of an inbound timer tick firing and deleting the Hosts map entry for a newer handshake attempt than intended, thus leaving the old Indexes entry orphaned. This change adds some extra checking when deleteing from the Indexes and Hosts maps to ensure we clean everything up correctly.	2021-03-01 12:40:46 -05:00
Wade Simmons	ee7c27093c	add HostMap.RemoteIndexes (#329 ) This change adds an index based on HostInfo.remoteIndexId. This allows us to use HostMap.QueryReverseIndex without having to loop over all entries in the map (this can be a bottleneck under high traffic lighthouses). Without this patch, a high traffic lighthouse server receiving recv_error packets and lots of handshakes, cpu pprof trace can look like this: flat flat% sum% cum cum% 2000ms 32.26% 32.26% 3040ms 49.03% github.com/slackhq/nebula.(*HostMap).QueryReverseIndex 870ms 14.03% 46.29% 1060ms 17.10% runtime.mapiternext Which shows 50% of total cpu time is being spent in QueryReverseIndex.	2020-11-23 14:51:16 -05:00
Wade Simmons	a54f3fc681	fix fast handshake trigger for static hosts (#265 ) We are currently triggering a fast handshake for static hosts right inside HandshakeManager.AddVpnIP, but this can actually trigger before we have generated the handshake packet to use. Instead, we should be triggering right after we call ixHandshakeStage0 in getOrHandshake (which generates the handshake packet)	2020-08-02 20:59:50 -04:00
Wade Simmons	4756c9613d	trigger handshakes when lighthouse reply arrives (#246 ) Currently, we wait until the next timer tick to act on the lighthouse's reply to our HostQuery. This means we can easily add hundreds of milliseconds of unnecessary delay to the handshake. To fix this, we can introduce a channel to trigger an outbound handshake without waiting for the next timer tick. A few samples of cold ping time between two hosts that require a lighthouse lookup: before (v1.2.0): time=156 ms time=252 ms time=12.6 ms time=301 ms time=352 ms time=49.4 ms time=150 ms time=13.5 ms time=8.24 ms time=161 ms time=355 ms after: time=3.53 ms time=3.14 ms time=3.08 ms time=3.92 ms time=7.78 ms time=3.59 ms time=3.07 ms time=3.22 ms time=3.12 ms time=3.08 ms time=8.04 ms I recommend reviewing this PR by looking at each commit individually, as some refactoring was required that makes the diff a bit confusing when combined together.	2020-07-22 10:35:10 -04:00
Wade Simmons	b37a91cfbc	add meta packet statistics (#230 ) This change add more metrics around "meta" (non "message" type packets). For lighthouse packets, we also record statistics around the specific lighthouse meta type. We don't keep statistics for the "message" type so that we don't slow down the fast path (and you can just look at metrics on the tun interface to find that information).	2020-06-26 13:45:48 -04:00
Wade Simmons	b4f2f7ce4e	log `certName` alongside `vpnIp` (#200 ) This change adds a new helper, `(*HostInfo).logger()`, that starts a new logrus.Entry with `vpnIp` and `certName`. We don't use the helper inside of handshake_ix though since the certificate has not been attached to the HostInfo yet. Fixes: #84	2020-04-06 11:34:00 -07:00
Wade Simmons	179a369130	add configuration options for HandshakeManager (#179 ) This change exposes the current constants we have defined for the handshake manager as configuration options. This will allow us to test and tweak with different intervals and wait rotations. # Handshake Manger Settings handshakes: # Total time to try a handshake = sequence of `try_interval * retries` # With 100ms interval and 20 retries it is 23.5 seconds try_interval: 100ms retries: 20 # wait_rotation is the number of handshake attempts to do before starting to try non-local IP addresses wait_rotation: 5	2020-02-21 16:25:11 -05:00
Slack Security Team	f22b4b584d	Public Release	2019-11-19 17:00:20 +00:00

8 Commits