Commit Graph

241 Commits

Author SHA1 Message Date
John Maguire
34d002d695 Check CA cert and key match in nebula-cert sign (#503)
`func (nc *NebulaCertificate) VerifyPrivateKey(key []byte) error` would
previously return an error even if passed the correct private key for a
CA certificate `nc`.

That function has been updated to support CA certificates, and
nebula-cert now calls it before signing a new certificate. Previously,
it would perform all constraint checks against the CA certificate
provided, take a SHA256 fingerprint of the provided certificate, insert
it into the new node certificate, and then finally sign it with the
mismatching private key provided.
2021-10-01 12:43:33 -04:00
Ben Yanke
9f34c5e2ba Typo Fix (#523) 2021-09-16 00:12:08 -05:00
Joe Doss
3f5caf67ff Add info about Distribution Packages. (#414) 2021-09-15 17:57:35 -05:00
Stan Grishin
e01213cd21 Update README.md (#378)
Add missing period.
2021-09-15 17:50:01 -05:00
Jack Adamson
af3674ac7b add peer cert issuer to handshake log entries (#510)
Co-authored-by: Jack Adamson <jackadamson@users.noreply.github.com>
2021-08-31 11:57:38 +10:00
Nate Brown
c726d20578 Fix single command ssh exec (#483) 2021-06-07 17:06:59 -05:00
Andrii Chubatiuk
d13f4b5948 fixed recv_errors spoofing condition (#482)
Hi @nbrownus
Fixed a small bug that was introduced in
df7c7ee#diff-5d05d02296a1953fd5fbcb3f4ab486bc5f7c34b14c3bdedb068008ec8ff5beb4
having problems due to it
2021-06-03 13:04:04 -04:00
Nate Brown
2e1d6743be v1.4.0 (#458)
Update CHANGELOG for Nebula v1.4.0

Co-authored-by: Wade Simmons <wade@wades.im>
2021-05-10 21:23:49 -04:00
Nate Brown
d004fae4f9 Unlock the hostmap quickly, lock hostinfo instead (#459) 2021-05-05 13:10:55 -05:00
Nate Brown
95f4c8a01b Don't check for rebind if we are closing the tunnel (#457) 2021-05-04 19:15:24 -05:00
Nate Brown
9ff73cb02f Increase the timestamp resolution for handshakes (#453) 2021-05-03 14:10:00 -05:00
John Maguire
98c391396c Remove log when no handshake message is sent (#452) 2021-04-30 18:19:40 -05:00
Nate Brown
1bc6f5fe6c Minor windows focused improvements (#443)
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
2021-04-30 15:04:47 -05:00
Wade Simmons
44cb697552 Add more metrics (#450)
* Add more metrics

This change adds the following counter metrics:

Metrics to track packets dropped at the firewall:

    firewall.dropped.local_ip
    firewall.dropped.remote_ip
    firewall.dropped.no_rule

Metrics to track handshakes attempts that have been initiated and ones
that have timed out (ones that have completed are tracked by the
existing "handshakes" histogram).

    handshake_manager.initiated
    handshake_manager.timed_out

Metrics to track when cached_packets are dropped because we run out of
buffer space, and how many are sent once the handshake completes.

    hostinfo.cached_packets.dropped
    hostinfo.cached_packets.sent

This change also notes how many cached packets we have when we log the
final "Handshake received" message for either stage1 for stage2.

* separate incoming/outgoing metrics

* remove "allowed" firewall metrics

We don't need this on the hotpath, they aren't worh it.

* don't need pointers here
2021-04-27 22:23:18 -04:00
Nathan Brown
db23fdf9bc Dont apply race avoidance to existing handshakes, use the handshake time to determine who wins (#451)
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
2021-04-27 21:15:34 -05:00
Nathan Brown
df7c7eec4a Get out faster on nil udpAddr (#449) 2021-04-26 20:21:47 -05:00
Nathan Brown
6f37280e8e Fully close tunnels when CloseAllTunnels is called (#448) 2021-04-26 10:42:24 -05:00
Nathan Brown
a0735dd7d5 Add locking around ssh conns to avoid concurrent map access on reload (#447) 2021-04-23 14:43:16 -05:00
Nathan Brown
1deb5d98e8 Fix tun funcs for ios and android (#446) 2021-04-22 15:23:40 -05:00
Nathan Brown
a1ee521d79 Fix a failed return in an error case (#445) 2021-04-17 18:47:31 -05:00
brad-defined
7859140711 Only set serveDns if the host is also configured to be a lighthouse. (#433) 2021-04-16 13:33:56 -05:00
brad-defined
17106f83a0 Ensure the Nebula device exists before attempting to bind to the Nebula IP (#375) 2021-04-16 10:34:28 -05:00
Nathan Brown
ab08be1e3e Don't panic on a nil response from the lighthouse (#442) 2021-04-15 09:12:21 -05:00
Nathan Brown
710df6a876 Refactor remotes and handshaking to give every address a fair shot (#437) 2021-04-14 13:50:09 -05:00
John Maguire
20bef975cd Remove obsolete systemd unit settings (take 2) (#438) 2021-04-07 12:02:40 -05:00
Nathan Brown
480036fbc8 Remove unused structs in hostmap.go (#430) 2021-04-01 22:07:11 -05:00
Nathan Brown
1499be3e40 Fix name resolution for host names in config (#431) 2021-04-01 21:48:41 -05:00
Nathan Brown
64d8e5aa96 More LH cleanup (#429) 2021-04-01 10:23:31 -05:00
Nathan Brown
75f7bda0a4 Lighthouse performance pass (#418) 2021-03-31 17:32:02 -05:00
Nathan Brown
e7e55618ff Include bad backets in the good handshake test (#428) 2021-03-31 13:36:10 -05:00
Nathan Brown
0c2e5973e1 Simple lie test (#427) 2021-03-31 10:26:35 -05:00
Nathan Brown
830d6d4639 Start of end to end testing with a good handshake between two nodes (#425) 2021-03-29 14:29:20 -05:00
Nathan Brown
883e09a392 Don't use a global ca pool (#426) 2021-03-29 12:10:19 -05:00
Wade Simmons
4603b5b2dd fix PromoteEvery check (#424)
This check was accidentally typo'd in #396 from `%` to `&`. Restore the
correct functionality here (we want to do the check every "PromoteEvery"
count packets).
2021-03-26 15:01:05 -04:00
Wade Simmons
a71541fb0b export build version as a prometheus label (#405)
This is how Prometheus recommends you do it, and how they do it
themselves in their client. This makes it easy to see which versions you
have deployed in your fleet, and query over it too.
2021-03-26 14:16:35 -04:00
Nathan Brown
3ea7e1b75f Don't use a global logger (#423) 2021-03-26 09:46:30 -05:00
Nathan Brown
7a9f9dbded Don't craft buffers if we don't need them (#416) 2021-03-22 18:25:06 -05:00
Nathan Brown
7073d204a8 IPv6 support for outside (udp) (#369) 2021-03-18 20:37:24 -05:00
Joe Doss
9e94442ce7 Add fedora dist files. (#413) 2021-03-18 12:33:43 -07:00
Joe Doss
13471f5792 Remove obsolete systemd unit settings. (#412) 2021-03-18 12:29:36 -07:00
Thomas Roten
ea07a89cc8 Ensure mutex is unlocked when adding remote IP. (#406)
Currently, if you use the remote allow list config, as soon as you attempt to create a tunnel to a node that has a blocked IP address, a mutex is locked and never unlocked. This happens even if the node has an allowed remote IP address in addition to the blocked remote IP address.

This pull request ensures that the lighthouse mutex is unlocked whenever we attempt to add a remote IP.
2021-03-16 12:41:35 -04:00
Ryan Huber
3aaaea6309 don't allow a useless handshake with yourself (#402)
* don't allow a useless handshake with yourself

* remove helper
2021-03-15 12:58:23 -07:00
Wade Simmons
5506da3de9 Fix selection of UDP remote to use during stage2 (#404)
The change for #401 incorrectly called HostInfo.ForcePromoteBest in
stage2, when we really we want to pick the remote that we received the
response from.
2021-03-12 21:43:24 -05:00
Wade Simmons
6c55d67f18 Refactor handshake_ix (#401)
There are some subtle race conditions with the previous handshake_ix implementation, mostly around collisions with localIndexId. This change refactors it so that we have a "commit" phase during the handshake where we grab the lock for the hostmap and ensure that we have a unique local index before storing it. We also now avoid using the pending hostmap at all for receiving stage1 packets, since we have everything we need to just store the completed handshake.

Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Co-authored-by: Ryan Huber <rhuber@gmail.com>
Co-authored-by: forfuncsake <drussell@slack-corp.com>
2021-03-12 14:16:25 -05:00
Wade Simmons
64d8035d09 fix race in getOrHandshake (#400)
We missed this race with #396 (and I think this is also the crash in
issue #226). We need to lock a little higher in the getOrHandshake
method, before we reset hostinfo.ConnectionInfo. Previously, two
routines could enter this section and confuse the handshake process.

This could result in the other side sending a recv_error that also has
a race with setting hostinfo.ConnectionInfo back to nil. So we make sure
to grab the lock in handleRecvError as well.

Neither of these code paths are in the hot path (handling packets
between two hosts over an active tunnel) so there should be no
performance concerns.
2021-03-09 09:27:02 -05:00
Ryan Huber
73a5ed90b2 Do not allow someone to run a nebula lighthouse with an ephemeral port (#399)
* Do not allow someone to run a nebula lighthouse with an ephemeral port

* derp - we discover the port so we have to check the config setting

* No context needed for this error

* gofmt yourself

* Revert "gofmt yourself"

This reverts commit c01423498e3792f7acd69d7e691dce1edad81bcb.

* Revert "No context needed for this error"

This reverts commit 6792af6846d1200c564a4ad601a637535dd56c5b.

* snip snap snip snap
2021-03-08 12:42:06 -08:00
Wade Simmons
d604270966 Fix most known data races (#396)
This change fixes all of the known data races that `make smoke-docker-race` finds, except for one.

Most of these races are around the handshake phase for a hostinfo, so we add a RWLock to the hostinfo and Lock during each of the handshake stages.

Some of the other races are around consistently using `atomic` around the `messageCounter` field. To make this harder to mess up, I have renamed the field to `atomicMessageCounter` (I also removed the unnecessary extra pointer deference as we can just point directly to the struct field).

The last remaining data race is around reading `ConnectionInfo.ready`, which is a boolean that is only written to once when the handshake has finished. Due to it being in the hot path for packets and the rare case that this could actually be an issue, holding off on fixing that one for now.

here is the results of `make smoke-docker-race`:

before:

    lighthouse1: Found 2 data race(s)
    host2:       Found 36 data race(s)
    host3:       Found 17 data race(s)
    host4:       Found 31 data race(s)

after:

    host2: Found 1 data race(s)
    host4: Found 1 data race(s)

Fixes: #147
Fixes: #226
Fixes: #283
Fixes: #316
2021-03-05 21:18:33 -05:00
Nathan Brown
29c5f31f90 Add a check in the makefile to ensure a minimum version of go is installed (#383) 2021-03-02 13:29:05 -06:00
Nathan Brown
b6234abfb3 Add a way to trigger punch backs via lighthouse (#394) 2021-03-01 19:06:01 -06:00
Wade Simmons
2a4beb41b9 Routine-local conntrack cache (#391)
Previously, every packet we see gets a lock on the conntrack table and updates it. When running with multiple routines, this can cause heavy lock contention and limit our ability for the threads to run independently. This change caches reads from the conntrack table for a very short period of time to reduce this lock contention. This cache will currently default to disabled unless you are running with multiple routines, in which case the default cache delay will be 1 second. This means that entries in the conntrack table may be up to 1 second out of date and remain in a routine local cache for up to 1 second longer than the global table.

Instead of calling time.Now() for every packet, this cache system relies on a tick thread that updates the current cache "version" each tick. Every packet we check if the cache version is out of date, and reset the cache if so.
2021-03-01 19:52:17 -05:00