Wed. Jan 22nd, 2025

In this small page i would like to discuss about some kernel parameters that might worth it to change. Using low values for these parameters are likely to cause poor tcp performance.

net.core.somaxconn

In older kernels it may have the value of `128` and for new kernels may be `1024`. This number regards to the absolute number of connections that can be active at the time. If the default is too low for the traffic you need to handle, the connections as silently dropped to fit this number.

net.ipv4.tcp_max_syn_backlog

This regards to the number of connection that can be in the queue. The default values are `128` for older kernels and `1024` for newer but if `net.core.somaxconn` is smaller than `net.ipv4.tcp_max_syn_backlog` then it drops silently to `net.core.somaxconn` value.

net.ipv4.tcp_tw_reuse

Should be enabled as not enabling this you may get out of ports available due too many connections in CLOSE-WAIT state. There is another parameter that people tend to mix up with that is `net.ipv4.tcp_tw_recycle` which should not be used as it can cause big trouble on firewalls, balancer and nat boxes.

net.ipv4.ip_local_port_range

Range of high ports available for use. Normally a low range of ports together with `net.ipv4.tcp_tw_reuse` disabled may cause trouble since the system can run out of ports during high traffic or spikes. It’s safer to use together with `net.ipv4.ip_local_reserved_ports` if the system uses high ports for services.

net.ipv4.[tcp|udp]_wmem

These parameters are used by TCP to regulate send buffer sizes. The default max value (4Mb) is too small for 1Gbit links and even smaller for 10Gbit links.

net.ipv4.[tcp|udp]_rmem

These parameters are used by TCP to regulate receive buffer sizes. Also too small for today’s links.

Tuning

It is recommended to change these values to match our load and capacity. For the tcp ones the formula to tune these values is pretty simple:

bandwidth-in-bits-per-second * round-trip latency in seconds (may vary) / 8
For 1Gbit ethernet with a average latency of 450 microseconds:
1000000000*0.450/8 = 56250000 # it's possible to round it up to a more logic number as 64Mb as the kernel will adjust it between the default and max values
For 10Gbit ethernet with the same average latency
10000000000*0.450/8 = 562500000 # I would suggest 512Mb (536870912) to be more logical

Recommended values

net.core.somaxconn = 8192 # may fit all ESTABLISHED and WAITING (SYN, FIN, CLOSE) connections - should be even more depending on the traffic
net.core.netdev_max_backlog = 4096 # define the number of connection that can be on hold (not acked) - do not need to be as big as somaxconn as somaxconn includes everything.
net.ipv4.tcp_tw_reuse = 1 # enabled to reuse already closed ports
net.ipv4.ip_local_port_range = 10001 65535 # more 20000 ports
# for 1Gbit nic
net.core.netdev_max_backlog = 65536
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.ipv4.tcp_rmem = 4096 8388608 67108864
net.ipv4.tcp_wmem = 4096 8388608 67108864
# for 10Gbit nic
net.core.netdev_max_backlog = 65536
net.core.rmem_max = 562500000
net.core.wmem_max = 562500000
net.ipv4.tcp_rmem = 4096 8388608 562500000
net.ipv4.tcp_wmem = 4096 8388608 562500000

Jumbo Frames

Also for better performance i would suggest adopt the usage of jumbo frames for servers with high volume of data transfer. This may decrease the network latency as the system (servers and switches) may need to transmit/receive less frames for the same volume of data. Also not use Jumbo Frames for 10Gbit (or even for 1Gbit interface) is a waste of resources.

References

https://linux.die.net/man/7/tcp
https://www.acc.umu.se/~maswan/linux-netperf.txt
http://bradhedlund.com/2008/12/19/how-to-calculate-tcp-throughput-for-long-distance-links/

Leave a Reply

Your email address will not be published. Required fields are marked *