This change fixes all of the known data races that `make smoke-docker-race` finds, except for one.
Most of these races are around the handshake phase for a hostinfo, so we add a RWLock to the hostinfo and Lock during each of the handshake stages.
Some of the other races are around consistently using `atomic` around the `messageCounter` field. To make this harder to mess up, I have renamed the field to `atomicMessageCounter` (I also removed the unnecessary extra pointer deference as we can just point directly to the struct field).
The last remaining data race is around reading `ConnectionInfo.ready`, which is a boolean that is only written to once when the handshake has finished. Due to it being in the hot path for packets and the rare case that this could actually be an issue, holding off on fixing that one for now.
here is the results of `make smoke-docker-race`:
before:
lighthouse1: Found 2 data race(s)
host2: Found 36 data race(s)
host3: Found 17 data race(s)
host4: Found 31 data race(s)
after:
host2: Found 1 data race(s)
host4: Found 1 data race(s)
Fixes: #147Fixes: #226Fixes: #283Fixes: #316
This change add more metrics around "meta" (non "message" type packets).
For lighthouse packets, we also record statistics around the specific
lighthouse meta type.
We don't keep statistics for the "message" type so that we don't slow
down the fast path (and you can just look at metrics on the tun
interface to find that information).
* add configurable punching delay because of race-condition-y conntracks
* add changelog
* fix tests
* only do one punch per query
* Coalesce punchy config
* It is not is not set
* Add tests
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
This change exposes the current constants we have defined for the handshake
manager as configuration options. This will allow us to test and tweak
with different intervals and wait rotations.
# Handshake Manger Settings
handshakes:
# Total time to try a handshake = sequence of `try_interval * retries`
# With 100ms interval and 20 retries it is 23.5 seconds
try_interval: 100ms
retries: 20
# wait_rotation is the number of handshake attempts to do before starting to try non-local IP addresses
wait_rotation: 5