trigger handshakes when lighthouse reply arrives (#246)

Currently, we wait until the next timer tick to act on the lighthouse's reply to our HostQuery. This means we can easily add hundreds of milliseconds of unnecessary delay to the handshake. To fix this, we can introduce a channel to trigger an outbound handshake without waiting for the next timer tick. A few samples of cold ping time between two hosts that require a lighthouse lookup: before (v1.2.0): time=156 ms time=252 ms time=12.6 ms time=301 ms time=352 ms time=49.4 ms time=150 ms time=13.5 ms time=8.24 ms time=161 ms time=355 ms after: time=3.53 ms time=3.14 ms time=3.08 ms time=3.92 ms time=7.78 ms time=3.59 ms time=3.07 ms time=3.22 ms time=3.12 ms time=3.08 ms time=8.04 ms I recommend reviewing this PR by looking at each commit individually, as some refactoring was required that makes the diff a bit confusing when combined together.
2020-07-22 10:35:10 -04:00
parent 4645e6034b
commit 4756c9613d
5 changed files with 169 additions and 66 deletions
--- a/lighthouse.go
+++ b/lighthouse.go
@ -30,6 +30,9 @@ type LightHouse struct {
 	// filters local addresses that we advertise to lighthouses
 	localAllowList *AllowList

+	// used to trigger the HandshakeManager when we receive HostQueryReply
+	handshakeTrigger chan<- uint32
+
 	// staticList exists to avoid having a bool in each addrMap entry
 	// since static should be rare
 	staticList  map[uint32]struct{}
@ -358,6 +361,11 @@ func (lh *LightHouse) HandleRequest(rAddr *udpAddr, vpnIp uint32, p []byte, c *c
 			ans := NewUDPAddr(a.Ip, uint16(a.Port))
 			lh.AddRemote(n.Details.VpnIp, ans, false)
 		}
+		// Non-blocking attempt to trigger, skip if it would block
+		select {
+		case lh.handshakeTrigger <- n.Details.VpnIp:
+		default:
+		}

 	case NebulaMeta_HostUpdateNotification:
 		//Simple check that the host sent this not someone else