开启内核参数tcp_tw_recycle引发的kubernetes集群网络血案


最近监控日志发现pod在连接注册中心的时候经常会出现超时的现象

2019-07-19 22:43:25.928 ERROR 1 --- [freshExecutor-0] c.n.d.s.t.d.RedirectingEurekaHttpClient : Request execution error

com.sun.jersey.api.client.ClientHandlerException: org.apache.http.conn.ConnectTimeoutException: Connect to 192.168.20.21:9999 timed out
at com.sun.jersey.client.apache4.ApacheHttpClient4Handler.handle(ApacheHttpClient4Handler.java:187)
at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123)
at com.netflix.discovery.EurekaIdentityHeaderFilter.handle(EurekaIdentityHeaderFilter.java:27)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509)
at com.netflix.discovery.shared.transport.jersey.AbstractJerseyEurekaHttpClient.getApplicationsInternal(AbstractJerseyEurekaHttpClient.java:194)
at com.netflix.discovery.shared.transport.jersey.AbstractJerseyEurekaHttpClient.getDelta(AbstractJerseyEurekaHttpClient.java:170)
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator$7.execute(EurekaHttpClientDecorator.java:152)
at com.netflix.discovery.shared.transport.decorator.MetricsCollectingEurekaHttpClient.execute(MetricsCollectingEurekaHttpClient.java:73)
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.getDelta(EurekaHttpClientDecorator.java:149)
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator$7.execute(EurekaHttpClientDecorator.java:152)
at com.netflix.discovery.shared.transport.decorator.RedirectingEurekaHttpClient.execute(RedirectingEurekaHttpClient.java:89)
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.getDelta(EurekaHttpClientDecorator.java:149)
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator$7.execute(EurekaHttpClientDecorator.java:152)
at com.netflix.discovery.shared.transport.decorator.RetryableEurekaHttpClient.execute(RetryableEurekaHttpClient.java:120)
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.getDelta(EurekaHttpClientDecorator.java:149)
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator$7.execute(EurekaHttpClientDecorator.java:152)
at com.netflix.discovery.shared.transport.decorator.SessionedEurekaHttpClient.execute(SessionedEurekaHttpClient.java:77)
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.getDelta(EurekaHttpClientDecorator.java:149)
at com.netflix.discovery.DiscoveryClient.getAndUpdateDelta(DiscoveryClient.java:1085)
at com.netflix.discovery.DiscoveryClient.fetchRegistry(DiscoveryClient.java:967)
at com.netflix.discovery.DiscoveryClient.refreshRegistry(DiscoveryClient.java:1473)
at com.netflix.discovery.DiscoveryClient$CacheRefreshThread.run(DiscoveryClient.java:1440)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to 192.168.20.21:9999 timed out
at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:144)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:134)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:610)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:445)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:835)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:118)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at com.sun.jersey.client.apache4.ApacheHttpClient4Handler.handle(ApacheHttpClient4Handler.java:173)
... 29 common frames omitted

由于是不定时出现这种连接超时的问题,所以只能持续抓包分析抓包日志

nohup tcpdump -i em1 tcp dst port 9999 >> tcpdump_9999.log &

碰巧在其他机器手动访问了一下注册中心的地址发现超时了,查看日志发现大量请求服务器没有返回ack,也就是说服务端丢弃了大量的syn请求包。

16:34:42.458491 IP kube03.57966 > kube01.distinct: Flags [P.], seq 29388:29667, ack 17733, win 765, options [nop,nop,TS val 391647039 ecr 391643088], length 279
16:34:42.459310 IP kube03.57966 > kube01.distinct: Flags [.], ack 17953, win 774, options [nop,nop,TS val 391647040 ecr 391647039], length 0
16:34:43.988913 IP kube03.38188 > kube01.distinct: Flags [P.], seq 16686:16965, ack 8749, win 1257, options [nop,nop,TS val 391648569 ecr 391623499], length 279
16:34:43.989658 IP kube03.38188 > kube01.distinct: Flags [.], ack 8969, win 1278, options [nop,nop,TS val 391648570 ecr 391648570], length 0
16:34:44.858144 IP kube03.55546 > kube01.distinct: Flags [P.], seq 17081:17360, ack 8853, win 1240, options [nop,nop,TS val 391649438 ecr 391637415], length 279
16:34:44.859096 IP kube03.55546 > kube01.distinct: Flags [.], ack 9073, win 1261, options [nop,nop,TS val 391649439 ecr 391649439], length 0
16:34:45.186843 IP 192.168.0.93.23799 > kube01.distinct: Flags [S], seq 2690736532, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
16:34:45.191874 IP 192.168.0.93.23800 > kube01.distinct: Flags [S], seq 1730302836, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
16:34:45.429889 IP 192.168.0.93.23801 > kube01.distinct: Flags [S], seq 2752216100, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0

去服务端查一下syn丢包情况

[root@kube02 ~]# netstat -s | grep LISTEN
387 SYNs to LISTEN sockets dropped

恩,思路已经很清晰, 开始谷歌吧,经过一番查阅发现该服务器同时启用了 tcp_timestamps和tcp_tw_recycle参数。
问题原因:

但这两个参数同时为1的时候,而且client端在NAT网络环境中,就很容易出现TCP握手异常,表现为端口间歇性出现telnet不通。
因为NAT环境中,与服务器建立连接的同一个ip,实际包含了多个client,这些client机器的timestamp可能并不是完全同步,这就有可能造成2个client同时向服务器发起握手,后发起握手的client由于timestamp比前一个发起握手的client小,服务器就判断后发起握手的client数据包是异常而丢弃。

解决办法是,关闭tcp_tw_recycle:

$ vi /etc/sysctl.conf
#修改为如下
net.ipv4.tcp_tw_recycle = 0
#保存退出,使之生效
$ sysctl -p

问题解决。

 


Whatever is worth doing is worth doing well.