The change introduced by #320 incorrectly re-uses the output buffer for
sending punchBack packets. Since we are currently spawning a new
goroutine for each send here, we need to allocate a new buffer each
time. We can come back and optimize this in the future, but for now we
should fix the regression.
This change adds an index based on HostInfo.remoteIndexId. This allows
us to use HostMap.QueryReverseIndex without having to loop over all
entries in the map (this can be a bottleneck under high traffic
lighthouses).
Without this patch, a high traffic lighthouse server receiving recv_error
packets and lots of handshakes, cpu pprof trace can look like this:
flat flat% sum% cum cum%
2000ms 32.26% 32.26% 3040ms 49.03% github.com/slackhq/nebula.(*HostMap).QueryReverseIndex
870ms 14.03% 46.29% 1060ms 17.10% runtime.mapiternext
Which shows 50% of total cpu time is being spent in QueryReverseIndex.
We noticed that the number of memory allocations LightHouse.HandleRequest creates for each call can seriously impact performance for high traffic lighthouses. This PR introduces a benchmark in the first commit and then optimizes memory usage by creating a LightHouseHandler struct. This struct allows us to re-use memory between each lighthouse request (one instance per UDP listener go-routine).
Packet 1 is always a stage 1 handshake and packet 2 is always stage 2.
Normal packets don't start flowing until the message counter is 3 or
higher.
Currently we only receive either packet 1 or 2 depending on if
we are the initiator or responder for the handshake, so we end up
marking one of these as "lost". We should mark these packets as "seen"
when we are the one sending them, since we don't expect to see them from
the other side.
During shutdown, this will keep Nebula alive until after sshd is finished. This cleanly terminates ssh clients accessing a server over a Nebula tunnel.
* this brings in the new version of kardianos/service which properly
outputs logs from launchd services
* add go sum
* is it really this easy?
* Update CHANGELOG.md
Port 22 is blocked as a safety mechanism. In a case where nebula is
started before sshd, a system may be rendered unreachable if nebula
is holding the system ssh port and there is no other connectivity.
This commit enforces the restriction, which could previously be worked
around by listening on an IPv6 address, e.g. "[::]:22".
* Remove unused (*udpConn).Read method
* Align linux UDP performance optimizations with configuration
While attempting to run nebula on an older Synology NAS, it became
apparent that some of the performance optimizations effectively
block support for older kernels. The recvmmsg syscall was added in
linux kernel 2.6.33, and the Synology DS212j (among other models)
is pinned to 2.6.32.12.
Similarly, SO_REUSEPORT was added to the kernel in the 3.9 cycle.
While this option has been backported into some older trees, it
is also missing from the Synology kernel.
This commit allows nebula to be run on linux devices with older
kernels if the config options are set up with a single listener
and a UDP batch size of 1.
This commit adds support for Nebula to be started without creating
a tun device. A node started in this mode still has a full "control
plane", but no effective "data plane". Its use is suited to a
lighthouse that has no need to partake in the mesh VPN.
Consequently, creation of the tun device is the only reason nebula
neesd to be started with elevated privileged, so this example
lighthouse can also be run as a non-root user.
Currently, if a packet arrives on the tun device with a destination that
is not a routable Nebula IP, `queryUnsafeRoute` converts that IP to
0.0.0.0 and we store that packet and try to look up that IP with the
lighthouse. This doesn't make any sense to do, if we get a packet that
is unroutable we should just drop it.
Note, we have a few configurable options like `drop_local_broadcast`
and `drop_multicast` which do this for a few specific types, but since
no packets like this will send correctly I think we should just drop
anything that is unroutable.
We are currently triggering a fast handshake for static hosts right
inside HandshakeManager.AddVpnIP, but this can actually trigger before
we have generated the handshake packet to use. Instead, we should be
triggering right after we call ixHandshakeStage0 in getOrHandshake
(which generates the handshake packet)
Currently, we drop the conntrack table when firewall rules change during a SIGHUP reload. This means responses to inflight HTTP requests can be dropped, among other issues. This change copies the conntrack table over to the new firewall (it holds the conntrack mutex lock during this process, to be safe).
This change also records which firewall rules hash each conntrack entry used, so that we can re-verify the rules after the new firewall has been loaded.
This commit updates the Interface.Inside type to be a new interface
type instead of a *Tun. This will allow for an inside interface
that does not use a tun device, such as a single-binary client that
can run without elevated privileges.
Currently, we wait until the next timer tick to act on the lighthouse's
reply to our HostQuery. This means we can easily add hundreds of
milliseconds of unnecessary delay to the handshake. To fix this, we
can introduce a channel to trigger an outbound handshake without waiting
for the next timer tick.
A few samples of cold ping time between two hosts that require a
lighthouse lookup:
before (v1.2.0):
time=156 ms
time=252 ms
time=12.6 ms
time=301 ms
time=352 ms
time=49.4 ms
time=150 ms
time=13.5 ms
time=8.24 ms
time=161 ms
time=355 ms
after:
time=3.53 ms
time=3.14 ms
time=3.08 ms
time=3.92 ms
time=7.78 ms
time=3.59 ms
time=3.07 ms
time=3.22 ms
time=3.12 ms
time=3.08 ms
time=8.04 ms
I recommend reviewing this PR by looking at each commit individually, as
some refactoring was required that makes the diff a bit confusing when
combined together.
* enforce the use of goimports
Instead of enforcing `gofmt`, enforce `goimports`, which also asserts
a separate section for non-builtin packages.
* run `goimports` everywhere
* exclude generated .pb.go files
If different mtus are specified for different routes, we should set
advmss on each route because Linux does a poor job of selecting the
default (from ip-route(8)):
advmss NUMBER (Linux 2.3.15+ only)
the MSS ('Maximal Segment Size') to advertise to these destinations when estab‐
lishing TCP connections. If it is not given, Linux uses a default value calcu‐
lated from the first hop device MTU. (If the path to these destination is asym‐
metric, this guess may be wrong.)
Note that the default value is calculated from the first hop *device
MTU*, not the *route MTU*. In practice this is usually ok as long as the
other side of the tunnel has the mtu configured exactly the same, but we
should probably just set advmss correctly on these routes.
Test that basic inbound / outbound firewall rules work during the smoke
test. This change sets an inbound firewall rule on host3, and a new
host4 with outbound firewall rules. It also tests that conntrack allows
packets once the connection has been established.
This makes GOARM more generic and does GOMIPS in a similar way to
support mips-softfloat. We also set `-ldflags "-s -w"` for
mips-softfloat to give the best chance of the binary working on these
small devices.
This change add more metrics around "meta" (non "message" type packets).
For lighthouse packets, we also record statistics around the specific
lighthouse meta type.
We don't keep statistics for the "message" type so that we don't slow
down the fast path (and you can just look at metrics on the tun
interface to find that information).
Add support for freebsd. You have to set `tun.dev` in your config. The second pass of this would be to remove the exec calls and use ioctl(2) and route(4) instead, but we can do that in a second PR.
Co-authored-by: Wade Simmons <wade@wades.im>
These settings make it possible to blacklist / whitelist IP addresses
that are used for remote connections.
`lighthouse.remoteAllowList` filters which remote IPs are allow when
fetching from the lighthouse (or, if you are the lighthouse, which IPs
you store and forward to querying hosts). By default, any remote IPs are
allowed. You can provide CIDRs here with `true` to allow and `false` to
deny. The most specific CIDR rule applies to each remote. If all rules
are "allow", the default will be "deny", and vice-versa. If both "allow"
and "deny" rules are present, then you MUST set a rule for "0.0.0.0/0"
as the default.
lighthouse:
remoteAllowList:
# Example to block IPs from this subnet from being used for remote IPs.
"172.16.0.0/12": false
# A more complicated example, allow public IPs but only private IPs from a specific subnet
"0.0.0.0/0": true
"10.0.0.0/8": false
"10.42.42.0/24": true
`lighthouse.localAllowList` has the same logic as above, but it applies
to the local addresses we advertise to the lighthouse. Additionally, you
can specify an `interfaces` map of regular expressions to match against
interface names. The regexp must match the entire name. All interface
rules must be either true or false (and the default rule will be the
inverse). CIDR rules are matched after interface name rules.
Default is all local IP addresses.
lighthouse:
localAllowList:
# Example to blacklist docker interfaces.
interfaces:
'docker.*': false
# Example to only advertise IPs in this subnet to the lighthouse.
"10.0.0.0/8": true
* Better config test
Previously, when using the config test option `-test`, we quit fairly
earlier in the process and would not catch a variety of additional
parsing errors (such as lighthouse IP addresses, local_range, the new
check to make sure static hosts are in the certificate's subnet, etc).
* run config test as part of smoke test
* don't need privileges for configtest
Co-authored-by: Nathan Brown <nate@slack-corp.com>
This change adds a new helper, `(*HostInfo).logger()`, that starts a new
logrus.Entry with `vpnIp` and `certName`. We don't use the helper inside
of handshake_ix though since the certificate has not been attached to
the HostInfo yet.
Fixes: #84
* add configurable punching delay because of race-condition-y conntracks
* add changelog
* fix tests
* only do one punch per query
* Coalesce punchy config
* It is not is not set
* Add tests
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
A CIDRTree can be expensive to create, so only do it if we need
it. If the remote host only has one IP address and no subnets, just do
an exact IP match instead.
Fixes: #171