This change is for Linux only.
Previously, when running with multiple tun.routines, we would only have one file descriptor. This change instead sets IFF_MULTI_QUEUE and opens a file descriptor for each routine. This allows us to process with multiple threads while preventing out of order packet reception issues.
To attempt to distribute the flows across the queues, we try to write to the tun/UDP queue that corresponds with the one we read from. So if we read a packet from tun queue "2", we will write the outgoing encrypted packet to UDP queue "2". Because of the nature of how multi queue works with flows, a given host tunnel will be sticky to a given routine (so if you try to performance benchmark by only using one tunnel between two hosts, you are only going to be using a max of one thread for each direction).
Because this system works much better when we can correlate flows between the tun and udp routines, we are deprecating the undocumented "tun.routines" and "listen.routines" parameters and introducing a new "routines" parameter that sets the value for both. If you use the old undocumented parameters, the max of the values will be used and a warning logged.
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
This change adds an index based on HostInfo.remoteIndexId. This allows
us to use HostMap.QueryReverseIndex without having to loop over all
entries in the map (this can be a bottleneck under high traffic
lighthouses).
Without this patch, a high traffic lighthouse server receiving recv_error
packets and lots of handshakes, cpu pprof trace can look like this:
flat flat% sum% cum cum%
2000ms 32.26% 32.26% 3040ms 49.03% github.com/slackhq/nebula.(*HostMap).QueryReverseIndex
870ms 14.03% 46.29% 1060ms 17.10% runtime.mapiternext
Which shows 50% of total cpu time is being spent in QueryReverseIndex.
We noticed that the number of memory allocations LightHouse.HandleRequest creates for each call can seriously impact performance for high traffic lighthouses. This PR introduces a benchmark in the first commit and then optimizes memory usage by creating a LightHouseHandler struct. This struct allows us to re-use memory between each lighthouse request (one instance per UDP listener go-routine).
* enforce the use of goimports
Instead of enforcing `gofmt`, enforce `goimports`, which also asserts
a separate section for non-builtin packages.
* run `goimports` everywhere
* exclude generated .pb.go files
This change add more metrics around "meta" (non "message" type packets).
For lighthouse packets, we also record statistics around the specific
lighthouse meta type.
We don't keep statistics for the "message" type so that we don't slow
down the fast path (and you can just look at metrics on the tun
interface to find that information).
These settings make it possible to blacklist / whitelist IP addresses
that are used for remote connections.
`lighthouse.remoteAllowList` filters which remote IPs are allow when
fetching from the lighthouse (or, if you are the lighthouse, which IPs
you store and forward to querying hosts). By default, any remote IPs are
allowed. You can provide CIDRs here with `true` to allow and `false` to
deny. The most specific CIDR rule applies to each remote. If all rules
are "allow", the default will be "deny", and vice-versa. If both "allow"
and "deny" rules are present, then you MUST set a rule for "0.0.0.0/0"
as the default.
lighthouse:
remoteAllowList:
# Example to block IPs from this subnet from being used for remote IPs.
"172.16.0.0/12": false
# A more complicated example, allow public IPs but only private IPs from a specific subnet
"0.0.0.0/0": true
"10.0.0.0/8": false
"10.42.42.0/24": true
`lighthouse.localAllowList` has the same logic as above, but it applies
to the local addresses we advertise to the lighthouse. Additionally, you
can specify an `interfaces` map of regular expressions to match against
interface names. The regexp must match the entire name. All interface
rules must be either true or false (and the default rule will be the
inverse). CIDR rules are matched after interface name rules.
Default is all local IP addresses.
lighthouse:
localAllowList:
# Example to blacklist docker interfaces.
interfaces:
'docker.*': false
# Example to only advertise IPs in this subnet to the lighthouse.
"10.0.0.0/8": true
This change adds a new helper, `(*HostInfo).logger()`, that starts a new
logrus.Entry with `vpnIp` and `certName`. We don't use the helper inside
of handshake_ix though since the certificate has not been attached to
the HostInfo yet.
Fixes: #84