27fc5e Refactor sysctl tuning

Authored and Committed by crameleon 7 months ago
    Refactor sysctl tuning
    
    - Use sysctl formula instead of static files.
    - Drop old tuning options which are deprecated, equal their default
      value, or were too generic to be applied to all machines
    - Split settings into common and role specific ones.
    - Move settings which were too generic but could be useful again on
      a role level in the future to a comment block.
    
    Below are descriptive comments with the thought processes involved
    with each changed parameter:
    
    1)
    ```
    net.ipv4.neigh.default.gc_stale_time = 3600
    net.ipv6.neigh.default.gc_stale_time = 3600
    
    gc_stale_time (since Linux 2.2)
           Determines how often to check for stale neighbor entries.
           When a neighbor entry is considered stale, it is resolved
           again before sending data to it.  Defaults to 60 seconds.
    ```
    
    There is no reason in our current network to check for stale entries
    less often. In fact, I would be confused `if ip neigh sh` updates slower
    than expected. It might be useful to combine this in situations where
    2) applies though.
    
    2)
    ```
    net.ipv4.neigh.default.gc_thresh3 = 4096
    net.ipv4.neigh.default.gc_thresh2 = 2048
    net.ipv4.neigh.default.gc_thresh1 = 1024
    net.ipv6.neigh.default.gc_thresh3 = 4096
    net.ipv6.neigh.default.gc_thresh2 = 2048
    net.ipv6.neigh.default.gc_thresh1 = 1024
    
    gc_thresh1 (since Linux 2.2)
           The minimum number of entries to keep in the ARP cache.
           The garbage collector will not run if there are fewer than
           this number of entries in the cache.  Defaults to 128.
    
    gc_thresh2 (since Linux 2.2)
           The soft maximum number of entries to keep in the ARP
           cache.  The garbage collector will allow the number of
           entries to exceed this for 5 seconds before collection
           will be performed.  Defaults to 512.
    
    gc_thresh3 (since Linux 2.2)
           The hard maximum number of entries to keep in the ARP
           cache.  The garbage collector will always run if there are
           more than this number of entries in the cache.  Defaults
           to 1024.
    ```
    
    This should be tuned on servers where kernel messages such as
    `arp_cache: neighbor table overflow!` are observed.
    On internal machines this should never happen.
    If it does happen on internet connected machines, we can
    re-add this on a role basis later on.
    
    3)
    ```
    net.core.netdev_max_backlog = 50000
    
    netdev_max_backlog
           Maximum number  of  packets,  queued  on  the  INPUT  side,
           when the interface receives packets faster than kernel can
           process them.
    ```
    
    I'm not opposed to keeping this, though I would be interested which
    specific systems benefit from it and how.
    
    4)
    ```
    net.ipv4.tcp_syncookies = 1
    
    tcp_syncookies - INTEGER
    	Only valid when the kernel was compiled with CONFIG_SYN_COOKIES
    	Send out syncookies when the syn backlog queue of a socket
    	overflows. This is to prevent against the common
            'SYN flood attack'
    	Default: 1
    ```
    
    This is 1 by default, and hence does not need to be set by us.
    
    5)
    ```
    net.ipv4.ip_forward = 0
    
    ip_forward - BOOLEAN
    	0 - disabled (default)
    	not 0 - enabled
    ```
    
    This is 0 by default, and hence does not need to be set by us. We will
    soon be setting this to 1 on a role.gateway level.
    
    6)
    ```
    net.ipv6.conf.all.forwarding = 0
    
    forwarding - BOOLEAN
    	Enable IP forwarding on this interface.  This controls whether
            packets received _on_ this interface can be forwarded.
    ```
    
    Although the default value not being explicitly declared, the text
    very much suggests that the toggle is there to enable it on demand, hence
    us not needing to set 0.
    Same as with 5), this will be enabled on a role level in the future.
    
    7)
    ```
    net.ipv4.tcp_ecn = 0
    
    tcp_ecn - INTEGER
    	Control use of Explicit Congestion Notification (ECN) by TCP.
    	ECN is used only when both ends of the TCP connection indicate
    	support for it.  This feature is useful in avoiding losses due
    	to congestion by allowing supporting routers to signal
    	congestion before having to drop packets.
    	Possible values are:
    		0 Disable ECN.  Neither initiate nor accept ECN.
    		1 Enable ECN when requested by incoming connections and
    		  also request ECN on outgoing connection attempts.
    		2 Enable ECN when requested by incoming connections
    		  but do not request ECN on outgoing connections.
    	Default: 2
    ```
    
    This is a very interesting one! Here I would not just leave it in, but
    rather set it to 1.
    https://en.wikipedia.org/wiki/Explicit_Congestion_Notification
    https://www.juniper.net/documentation/us/en/software/junos/cos/topics/concept/cos-qfx-series-explicit-congestion-notification-understanding.html
    An online research revealed this often having been disabled as
    common practice due to issues with network equipment in the past.
    Our network equipment should not have any issues handling ECN, given us
    not disabling it on the same hardware in other parts of the data centers.
    
    (cboltz adds: "Indeed, 1 looks like a good choice here.")
    
    8)
    ```
    net.ipv6.conf.default.autoconf = 0
    net.ipv6.conf.default.accept_ra = 0
    ```
    
    We might want to keep autoconf disabled (for now, we use only static IPv6
    addresses) but enable router advertisements along with running a radvd server.
    
    9)
    ```
    net.ipv6.conf.default.accept_ra_defrtr = 0
    
    accept_ra_defrtr - BOOLEAN
    	Learn default router in Router Advertisement.
    
    	Functional default: enabled if accept_ra is enabled.
    			    disabled if accept_ra is disabled.
    ```
    
    This setting follows `accept_ra` and is hence superfluous in our case.
    
    ```
    net.ipv4.neigh.default.gc_interval = 3600
    net.ipv6.neigh.default.gc_interval = 3600
    
    gc_interval (since Linux 2.2)
           How frequently the garbage collector for neighbor entries
           should attempt to run.  Defaults to 30 seconds.
    ```
    
    I don't think we suffer from any bottlenecks by keeping frequent garbage
    collection, however I could not find how such impact would be measured and located.
    It seems these toggles are useful if the kernel reports neighbour table overflows:
    http://www.cyberciti.biz/faq/centos-redhat-debian-linux-neighbor-table-overflow/.
    If they are needed on a machine suffeirng from such, we can enable it on a role or id
    level in the future. In this case we should use the `all` instead of the `default`
    namespace.
    
    10)
    ```
    net.ipv4.conf.all.log_martians = 0
    net.ipv4.conf.default.log_martians = 0
    
    log_martians - BOOLEAN
    	Log packets with impossible addresses to kernel log.
    	log_martians for the interface will be enabled if at least one of
    	conf/{all,interface}/log_martians is set to TRUE,
    	it will be disabled otherwise
    ```
    
    I am very much interested in this and would like it logged, hence setting this to 1.
    
    11)
    ```
    net.ipv4.conf.login.log_martians = 0
    net.ipv4.conf.private.log_martians = 0
    net.ipv4.conf.external.log_martians = 0
    ```
    
    These use hardcoded interface names which make no sense to write configuration for
    on all machines. Furthermore, the values match the previously defined default.
    
    12)
    ```
    net.ipv6.route.max_size=16384
    
    route/max_size - INTEGER
    	Maximum number of routes allowed in the kernel.  Increase
    	this when using large numbers of interfaces and/or routes.
    	From linux kernel 3.6 onwards, this is deprecated for ipv4
    	as route cache is no longer used.
    ```
    
    Deprecated, removing.
    
    13)
    ```
    net.bridge.bridge-nf-call-arptables = 0
    net.bridge.bridge-nf-call-ip6tables = 0
    net.bridge.bridge-nf-call-iptables = 0
    
    bridge-nf-call-arptables - BOOLEAN
    	1 : pass bridged ARP traffic to arptables' FORWARD chain.
    	0 : disable this.
    	Default: 1
    
    bridge-nf-call-iptables - BOOLEAN
    	1 : pass bridged IPv4 traffic to iptables' chains.
    	0 : disable this.
    	Default: 1
    
    bridge-nf-call-ip6tables - BOOLEAN
    	1 : pass bridged IPv6 traffic to ip6tables' chains.
    	0 : disable this.
    	Default: 1
    ```
    
    https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/net/bridge/br_netfilter.c?id=d049a43dcf06a3e155f5496aade5184755a288c4
    This is used by the `br_netfilter` module which is not loaded by default.
    Checking on all machines using Salt, it is currently only loaded on
    
    ```
    obsreview.infra.opensuse.org:
        br_netfilter           28672  0
        bridge                229376  1 br_netfilter
    gitlab-runner2.infra.opensuse.org:
        br_netfilter           32768  0
        bridge                434176  1 br_netfilter
    gitlab-runner1.infra.opensuse.org:
        br_netfilter           32768  0
        bridge                356352  1 br_netfilter
    ```
    
    This seems to correlate to
    https://unix.stackexchange.com/questions/719112/why-do-net-bridge-bridge-nf-call-arp-ip-ip6tables-default-to-1,
    which suggests this module being loaded by Docker.
    In this case it might make sense to keep these toggles at 0.
    Setting it globally would not hurt, and benefit machines which get `br_netfilter`
    loaded by other software.
    Of course, it is to be hoped that the deprecation finally happens, especially given the
    use of nftables in all scenarios not involving Docker.
    Alternatively it could be moved to a Docker role.
    
    14)
    ```
    net.bridge.bridge-nf-filter-pppoe-tagged = 0
    net.bridge.bridge-nf-filter-vlan-tagged = 0
    
    bridge-nf-filter-vlan-tagged - BOOLEAN
    	1 : pass bridged vlan-tagged ARP/IP/IPv6 traffic to {arp,ip,ip6}tables.
    	0 : disable this.
    	Default: 0
    
    bridge-nf-filter-pppoe-tagged - BOOLEAN
    	1 : pass bridged pppoe-tagged IP/IPv6 traffic to {ip,ip6}tables.
    	0 : disable this.
    	Default: 0
    ```
    
    Similarly legacy as described in 13).
    The machines using iptables/Docker do not use VLANs or PPPoE and hence
    do not benefit from this.
    
    15)
    ```
    vm.swappiness = 5
    
    swappiness
    
    This control is used to define how aggressive the kernel will swap
    memory pages.  Higher values will increase aggressiveness, lower values
    decrease the amount of swap.  A value of 0 instructs the kernel not to
    initiate swap until the amount of free and file-backed pages is less
    than the high water mark in a zone.
    
    The default value is 60.
    ```
    
    Quoting RedHat:
    ```
    Tuning vm.swappiness incorrectly may hurt performance or may have a different
    impact between light and heavy workloads.
    Changes to this parameter should be made in small increments and should be tested
    under the same conditions that the system normally operates.
    ```
    
    This should be tuned on memory hungry systems or when there is a recommendation
    specific to the software running on a particular machine - for example, GitLab
    recommends it in "constrained" environments:
    https://docs.gitlab.com/omnibus/settings/memory_constrained_envs.html#configure-swap.
    
    In all other cases I suggest to trust the default.
    
    (cboltz adds: "I'd add that we have very few machines/VMs that actually have swap.",
    "Also, we should consider to convert existing swap to "real" RAM which will give us a
    much better performance improvement than this setting ;-)")
    
    16)
    ```
    net.core.somaxconn = 2048
    
    somaxconn - INTEGER
    	Limit of socket listen() backlog, known in userspace as SOMAXCONN.
    	Defaults to 4096. (Was 128 before linux-5.4)
    	See also tcp_max_syn_backlog for additional tuning for TCP sockets.
    ```
    
    The comment "increasing the backlog limit" does no longer make sense, given the new
    default being even higher. Hence we can remove this setting from modern systems.
    
    17)
    ```
    net.ipv4.tcp_timestamps = 0
    
    tcp_timestamps - INTEGER
    Enable timestamps as defined in RFC1323.
    	0: Disabled.
    	1: Enable timestamps as defined in RFC1323 and use random offset for
    	each connection rather than only using the current time.
    	2: Like 1, but without random offsets.
    	Default: 1
    ```
    
    It seems this used to be a vulnerability in the past, however is no more by
    defaulting to using a random offset:
    https://security.stackexchange.com/a/224696.
    Hence I take this should be safe to keep at the default (1) now.
    
    18)
    ```
    net.core.optmem_max = 65536
    ```
    
    This might be interesting to investigate further as we are using up to 40G networking.
    Instructions on calculating an ideal value seem to be scarce though.
    https://indico.cern.ch/event/212228/contributions/1507212/attachments/333941/466017/10GE_network_tests_with_UDP.pdf
    
    19)
    ```
    net.ipv4.tcp_max_tw_buckets = 1440000
    
    tcp_max_tw_buckets - INTEGER
    	Maximal number of timewait sockets held by system simultaneously.
    	If this number is exceeded time-wait socket is immediately destroyed
    	and warning is printed. This limit exists only to prevent
    	simple DoS attacks, you _must_ not lower the limit artificially,
    	but rather increase it (probably, after increasing installed memory),
    	if network conditions require more than default value.
    ```
    
    I feel like they missed to elaborate on the "default value" in that text.
    IBM goes into a bit more detail:
    https://www.ibm.com/docs/en/linux-on-systems?topic=tuning-tcpip-ipv4-settings#net.ipv4.tcp_max_tw_buckets,
    suggesting the default to be 262144.
    Again, lots of "suggestions" on what to set this to, but no useful information on
    determining a value suited to our environment.
    
    20)
    ```
    net.ipv4.tcp_tw_recycle = 1
    ```
    
    This no longer exists:
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4396e46187ca5070219b81773c4e65088dac50cc
    
    21)
    ```
    net.ipv4.tcp_tw_reuse = 1
    
    tcp_tw_reuse - INTEGER
    	Enable reuse of TIME-WAIT sockets for new connections when it is
    	safe from protocol viewpoint.
    	0 - disable
    	1 - global enable
    	2 - enable for loopback traffic only
    	It should not be changed without advice/request of technical
    	experts.
    	Default: 2
    ```
    
    "technical experts" - are they saying all other kernel options should be set
    by the average user? ;-)
    Marginally more information is provided by this commit:
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=79e9fed460385a3d8ba0b5782e9e74405cb199b1,
    though it is still not clear why one would want to enable this for interfaces other
    than the loopback ones.
    I like the article in
    https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux, especially the part:
    
    > The Linux kernel documentation is not very helpful about what net.ipv4.tcp_tw_recycle
    > and net.ipv4.tcp_tw_reuse do. ...
    
    Yes!
    
    The article then goes on with a very in depth explanation.
    Most importantly, it explains the individual kernel options after stating
    
    > If you still think you have a problem with TIME-WAIT connections after reading the
    > previous section, there are three additional solutions to solve them:
    
    Hence, if we do think we have problems, I think we should properly read and understand this,
    and otherwise remove this until we did.
    
    22)
    ```
    net.ipv4.tcp_max_orphans = 16384
    
    tcp_max_orphans - INTEGER
    	Maximal number of TCP sockets not attached to any user file handle,
    	held by system.	If this number is exceeded orphaned connections are
    	reset immediately and warning is printed. This limit exists
    	only to prevent simple DoS attacks, you _must_ not rely on this
    	or lower the limit artificially, but rather increase it
    	(probably, after increasing installed memory),
    	if network conditions require more than default value,
    	and tune network services to linger and kill such states
    	more aggressively. Let me to remind again: each orphan eats
    	up to ~64K of unswappable memory.
    ```
    
    Sounds reasonable to reduce memory usage, but it explicitly states
    "you must not rely on this or lower the limit artificially".
    The commit message in
    https://github.com/torvalds/linux/commit/c5ed63d66f24fd4f7089b5a6e087b0ce7202aa8e suggests
    that the default depends on the host system memory.
    On my Tumblweed system with 16GB RAM, the default seems to be 65536.
    Hence we are lowering this on machines with enough memory against the very explicit
    recommendation not to. If we do want to tune this, we need to make it dependent on the
    machine memory. Easier would be to drop it and only configure it on a machine or role basis if
    a specific system is found to emit `TCP: too many of orphaned ....` warning messages:
    https://github.com/torvalds/linux/blob/c5ed63d66f24fd4f7089b5a6e087b0ce7202aa8e/net/ipv4/tcp.c#L2017.
    
    23)
    ```
    net.ipv4.tcp_orphan_retries = 0
    
    tcp_orphan_retries (integer; default: 8; since Linux 2.4)
           The maximum number of attempts made to probe the other end
           of a connection which has been closed by our end.
    
    or
    
    tcp_orphan_retries - INTEGER
    	This value influences the timeout of a locally closed TCP connection,
    	when RTO retransmissions remain unacknowledged.
    	See tcp_retries2 for more details.
    
    	The default value is 8.
    	If your machine is a loaded WEB server,
    	you should think about lowering this value, such sockets
    	may consume significant resources. Cf. tcp_max_orphans.
    ```
    
    Sysctl reports the default to be 0 on my Tumbleweed machine already,
    hence we do not need to set this.
    It seems the default of 8 mentioned in the documentation is rather playing a role
    in the following logic, and not referring to the sysctl value:
    https://github.com/openSUSE/kernel/blob/4d82a8f12dcb8809a6fd36f6df0a6c062eaf88ff/net/ipv4/tcp_timer.c#L148
    
    24)
    ```
    net.ipv4.ipfrag_low_thresh = 446464
    
    ipfrag_low_thresh - LONG INTEGER
    
        (Obsolete since linux-4.17) Maximum memory used to reassemble IP fragments before
        the kernel begins to remove incomplete fragment queues to free up resources.
        The kernel still accepts new fragments for defragmentation.
    ```
    
    Deprecated, removing.
    
    25)
    ```
    net.ipv4.neigh.default.proxy_qlen = 96
    
    proxy_qlen (since Linux 2.2)
           The maximum number of packets which may be queued to
           proxy-ARP addresses.  Defaults to 64.
    
    TLDP has slightly more words around it:
    
    /proc/sys/net/ipv4/neigh/DEV/proxy_delay
        Maximum time (real time is random [0..proxytime]) before answering to an
        ARP request for which we have an proxy ARP entry.
        In some cases, this is used to prevent network flooding.
    
    /proc/sys/net/ipv4/neigh/DEV/proxy_qlen
        Maximum queue length of the delayed proxy arp timer. (see proxy_delay).
    ```
    
    I do not know of us using Proxy ARP anywhere and hence do not think this makes sense to keep.
    The comment in our file and the two lines following match the ones in this file 1:1:
    https://gist.github.com/dkulagin/c5081095c123fc8fe3f80f43cd7a15d5#file-sysctl-conf-L216
    I know it's evil to suggest this was just copy pasted. :-)
    
    26)
    ```
    net.ipv4.neigh.default.unres_qlen = 6
    
    neigh/default/unres_qlen - INTEGER
    
        The maximum number of packets which may be queued for each unresolved address by
        other network layers.
    
        (deprecated in linux 3.3) : use unres_qlen_bytes instead.
    
        Prior to linux 3.3, the default value is 3 which may cause unexpected packet loss.
        The current default value is calculated according to default value of unres_qlen_bytes
        and true size of packet.
    
        Default: 101
    ```
    
    Deprecated, superseded by:
    
    ```
    neigh/default/unres_qlen_bytes - INTEGER
    	The maximum number of bytes which may be used by packets
    	queued for each	unresolved address by other network layers.
    	(added in linux 3.3)
    	Setting negative value is meaningless and will return error.
    	Default: SK_WMEM_MAX, (same as net.core.wmem_default).
    		Exact value depends on architecture and kernel options,
    		but should be enough to allow queuing 256 packets
    		of medium size.
    ```
    
    Unfortunately again a candidate with very scarce information on how to assess
    and calculate an optimized value.
    
    27)
    ```
    net.core.rmem_default = 16777216
    net.core.wmem_default = 16777216
    net.core.rmem_max = 16777216
    net.core.wmem_max = 16777216
    
    rmem_default
        The default setting of the socket receive buffer in bytes.
    
    rmem_max
        The maximum receive socket buffer size in bytes.
    
    wmem_default
        The default setting (in bytes) of the socket send buffer.
    
    wmem_max
        The maximum send socket buffer size in bytes.
    ```
    
    Relates to 18).
    https://www.tecchannel.de/a/tcp-ip-tuning-fuer-linux,429773,6 suggests this being
    for low memory situations.
    I'm not sure we suffer from those, though there might be other benefits which this
    single article did not cover.
    
    28)
    ```
    net.ipv4.tcp_mem=8388608 8388608 8388608
    net.ipv4.tcp_rmem=1048576 4194304 16777216
    net.ipv4.tcp_wmem=1048576 4194304 16777216
    
    tcp_mem - vector of 3 INTEGERs: min, pressure, max
    	min: below this number of pages TCP is not bothered about its
    	memory appetite.
    
    	pressure: when amount of memory allocated by TCP exceeds this number
    	of pages, TCP moderates its memory consumption and enters memory
    	pressure mode, which is exited when memory consumption falls
    	under "min".
    
    	max: number of pages allowed for queueing by all TCP sockets.
    
    	Defaults are calculated at boot time from amount of available
    	memory.
    
    tcp_rmem - vector of 3 INTEGERs: min, default, max
    	min: Minimal size of receive buffer used by TCP sockets.
    	It is guaranteed to each TCP socket, even under moderate memory
    	pressure.
    	Default: 4K
    
    	default: initial size of receive buffer used by TCP sockets.
    	This value overrides net.core.rmem_default used by other protocols.
    	Default: 87380 bytes. This value results in window of 65535 with
    	default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit
    	less for default tcp_app_win. See below about these variables.
    
    	max: maximal size of receive buffer allowed for automatically
    	selected receiver buffers for TCP socket. This value does not override
    	net.core.rmem_max.  Calling setsockopt() with SO_RCVBUF disables
    	automatic tuning of that socket's receive buffer size, in which
    	case this value is ignored.
    	Default: between 87380B and 6MB, depending on RAM size.
    
    tcp_wmem - vector of 3 INTEGERs: min, default, max
    	min: Amount of memory reserved for send buffers for TCP sockets.
    	Each TCP socket has rights to use it due to fact of its birth.
    	Default: 4K
    
    	default: initial size of send buffer used by TCP sockets.  This
    	value overrides net.core.wmem_default used by other protocols.
    	It is usually lower than net.core.wmem_default.
    	Default: 16K
    
    	max: Maximal amount of memory allowed for automatically tuned
    	send buffers for TCP sockets. This value does not override
    	net.core.wmem_max.  Calling setsockopt() with SO_SNDBUF disables
    	automatic tuning of that socket's send buffer size, in which case
    	this value is ignored.
    	Default: between 64K and 4MB, depending on RAM size.
    ```
    
    Again, these are values which have their defaults calculated based on memory size.
    As stated along the related settings in 27), I'm not sure we need this.
    If we do, it should, from my understanding, not be a one size fits all.
    
    29)
    ```
    net.ipv4.conf.all.log_martians = 0
    net.ipv4.conf.default.log_martians = 0
    ```
    
    This duplicates 10).
    
    30)
    ```
    net.ipv4.tcp_fin_timeout = 15
    
    tcp_fin_timeout - INTEGER
    	The length of time an orphaned (no longer referenced by any
    	application) connection will remain in the FIN_WAIT_2 state
    	before it is aborted at the local end.  While a perfectly
    	valid "receive only" state for an un-orphaned connection, an
    	orphaned connection in FIN_WAIT_2 state could otherwise wait
    	forever for the remote to close its end of the connection.
    	Cf. tcp_max_orphans
    	Default: 60 seconds
    ```
    
    Interestingly, the kernel documentation omits the TCP specification part:
    
    ```
    tcp_fin_timeout (integer; default: 60; since Linux 2.2)
        This specifies how many seconds to wait for a final FIN packet before the socket
        is forcibly closed. This is strictly a violation of the TCP specification, but
        required to prevent denial-of-service attacks. In Linux 2.2, the default value was 180.
    ```
    
    I do see that lowering this might be desirable for faster abortion of stale connections
    on the internet, but think this should not be altered on internal hosts.
    
    31)
    ```
    net.ipv4.tcp_keepalive_time = 300
    
    tcp_keepalive_time - INTEGER
    	How often TCP sends out keepalive messages when keepalive is enabled.
    	Default: 2hours.
    ```
    
    TLDP uses a few more words:
    
    ```
    tcp_keepalive_time
        the interval between the last data packet sent (simple ACKs are not considered data) and
        the first keepalive probe; after the connection is marked to need keepalive, this counter
        is not used any further
    ```
    
    It seems this only affects applications explicitly using TCP with keepalive enabled.
    I am confused by the comment stating "decrease the time default". According to my calculation, our
    value equals five hours, which is three hours more than the default.
    Though I might be confusing the units? If it is more than the default now,
    it would make sense to remove it.
    
    (cboltz adds: "The interesting question is which unit is used ;-)",
    "Since other settings use seconds, I wouldn't be too surprised if 300 means 5 minutes")
    
    32)
    ```
    net.ipv4.tcp_keepalive_probes = 5
    
    tcp_keepalive_probes - INTEGER
    	How many keepalive probes TCP sends out, until it decides that the
    	connection is broken. Default value: 9.
    
    tcp_keepalive_probes (integer; default: 9; since Linux 2.2)
        The maximum number of TCP keep-alive probes to send before giving up and killing the
        connection if no response is obtained from the other end.
    ```
    
    I can get behind why one might want to reduce keepalive probes, but
    cannot assess why 5 was chosen.
    The article
    https://webhostinggeeks.com/howto/tcp-keepalive-recommended-settings-and-best-practices/
    suggests values between three and five are reasonable.
    Since this setting is relatively easy to understand I would be fine with keeping it
    albeit not understanding the exact choice in value, though of course I would
    prefer to understand. :-)
    
    (cboltz adds: "If the number would be 4, I'd say it was https://xkcd.com/221/ ;-)
    (and I guess the reason for 5 is quite similar)",
    "Keeping this indeed sounds useful.")
    
    33)
    ```
    net.ipv4.tcp_keepalive_intvl = 15
    
    tcp_keepalive_intvl - INTEGER
        How frequently the probes are send out. Multiplied by tcp_keepalive_probes it is time
        to kill not responding connection, after probes started. Default value: 75sec i.e.
        connection will be aborted after ~11 minutes of retries.
    ```
    
    Same comment as in 32) applies.
    
    34)
    ```
    net.ipv4.route.flush=1
    net.ipv6.route.flush=1
    ```
    
    Tough to find information on this, there is a sysfs toggle with the same name/path which
    can be used to trigger a one-time flush of the routing cache.
    
    The person in https://unix.stackexchange.com/a/734077 suggests that setting this to 1
    is pointless. It is a bit more involved than some other toggles:
    https://github.com/torvalds/linux/commit/39a23e75087ce815abbddbd565b9a2e567ac47da
    and it's not quite clear when this would be triggered if not manually by calling the sysfs path.
    
    Signed-off-by: Georg Pfuetzenreuter <georg.pfuetzenreuter@suse.com>
    
        
file modified
+1 -0
file modified
+11 -0
file modified
+7 -0
file modified
+13 -8