In this small page i would like to discuss about some kernel parameters that might worth it to change. Using low values for these parameters are likely to cause poor tcp performance.
net.core.somaxconn
In older kernels it may have the value of `128` and for new kernels may be `1024`. This number regards to the absolute number of connections that can be active at the time. If the default is too low for the traffic you need to handle, the connections as silently dropped to fit this number.
net.ipv4.tcp_max_syn_backlog
This regards to the number of connection that can be in the queue. The default values are `128` for older kernels and `1024` for newer but if `net.core.somaxconn` is smaller than `net.ipv4.tcp_max_syn_backlog` then it drops silently to `net.core.somaxconn` value.
net.ipv4.tcp_tw_reuse
Should be enabled as not enabling this you may get out of ports available due too many connections in CLOSE-WAIT state. There is another parameter that people tend to mix up with that is `net.ipv4.tcp_tw_recycle` which should not be used as it can cause big trouble on firewalls, balancer and nat boxes.
net.ipv4.ip_local_port_range
Range of high ports available for use. Normally a low range of ports together with `net.ipv4.tcp_tw_reuse` disabled may cause trouble since the system can run out of ports during high traffic or spikes. It’s safer to use together with `net.ipv4.ip_local_reserved_ports` if the system uses high ports for services.
net.ipv4.[tcp|udp]_wmem
These parameters are used by TCP to regulate send buffer sizes. The default max value (4Mb) is too small for 1Gbit links and even smaller for 10Gbit links.
net.ipv4.[tcp|udp]_rmem
These parameters are used by TCP to regulate receive buffer sizes. Also too small for today’s links.
Tuning
It is recommended to change these values to match our load and capacity. For the tcp ones the formula to tune these values is pretty simple:
bandwidth-in-bits-per-second * round-trip latency in seconds (may vary) / 8 For 1Gbit ethernet with a average latency of 450 microseconds: 1000000000*0.450/8 = 56250000 # it's possible to round it up to a more logic number as 64Mb as the kernel will adjust it between the default and max values For 10Gbit ethernet with the same average latency 10000000000*0.450/8 = 562500000 # I would suggest 512Mb (536870912) to be more logical |
Recommended values
net.core.somaxconn = 8192 # may fit all ESTABLISHED and WAITING (SYN, FIN, CLOSE) connections - should be even more depending on the traffic net.core.netdev_max_backlog = 4096 # define the number of connection that can be on hold (not acked) - do not need to be as big as somaxconn as somaxconn includes everything. net.ipv4.tcp_tw_reuse = 1 # enabled to reuse already closed ports net.ipv4.ip_local_port_range = 10001 65535 # more 20000 ports # for 1Gbit nic net.core.netdev_max_backlog = 65536 net.core.rmem_max = 67108864 net.core.wmem_max = 67108864 net.ipv4.tcp_rmem = 4096 8388608 67108864 net.ipv4.tcp_wmem = 4096 8388608 67108864 # for 10Gbit nic net.core.netdev_max_backlog = 65536 net.core.rmem_max = 562500000 net.core.wmem_max = 562500000 net.ipv4.tcp_rmem = 4096 8388608 562500000 net.ipv4.tcp_wmem = 4096 8388608 562500000 |
Jumbo Frames
Also for better performance i would suggest adopt the usage of jumbo frames for servers with high volume of data transfer. This may decrease the network latency as the system (servers and switches) may need to transmit/receive less frames for the same volume of data. Also not use Jumbo Frames for 10Gbit (or even for 1Gbit interface) is a waste of resources.
References
https://linux.die.net/man/7/tcp
https://www.acc.umu.se/~maswan/linux-netperf.txt
http://bradhedlund.com/2008/12/19/how-to-calculate-tcp-throughput-for-long-distance-links/