Just Do IT !

解决Kubernetes1.15.1 部署Flannel网络后pod及容器无法跨主机互通问题

字数统计: 1.9k阅读时长: 11 min
2020/04/21 Share

记一次部署Flannel网络后网络不通问题, 查询网上资料无果

自己记录一下解决过程

现象

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@k8s-master01 ~]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-54j5c 1/1 Running 0 5h44m
coredns-5c98db65d4-jmvbf 1/1 Running 0 5h45m
etcd-k8s-master01 1/1 Running 2 10d
kube-apiserver-k8s-master01 1/1 Running 2 10d
kube-controller-manager-k8s-master01 1/1 Running 3 10d
kube-flannel-ds-amd64-6h79p 1/1 Running 2 9d
kube-flannel-ds-amd64-bnvtd 1/1 Running 3 10d
kube-flannel-ds-amd64-bsq4j 1/1 Running 2 9d
kube-proxy-5fn9m 1/1 Running 1 9d
kube-proxy-6hjvp 1/1 Running 2 9d
kube-proxy-t47n9 1/1 Running 2 10d
kube-scheduler-k8s-master01 1/1 Running 4 10d
kubernetes-dashboard-7d75c474bb-hg7zt 1/1 Running 0 71m
1
2
3
4
5
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready master 10d v1.15.1
k8s-node01 Ready <none> 9d v1.15.1
k8s-node02 Ready <none> 9d v1.15.1

由以上可以看到我部署Flannel以后, master检测到node节点 并且flannel容器显示Running正常

排查问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[root@k8s-master01 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:2c:d1:c2 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.50/24 brd 192.168.0.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe2c:d1c2/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:1f:d8:95:21 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether ee:02:3a:98:e3:e3 brd ff:ff:ff:ff:ff:ff
5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether d2:c2:72:50:95:31 brd ff:ff:ff:ff:ff:ff
inet 10.96.0.10/32 brd 10.96.0.10 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.0.1/32 brd 10.96.0.1 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.110.65.174/32 brd 10.110.65.174 scope global kube-ipvs0
valid_lft forever preferred_lft forever
6: flannel.1: <BROADCAST,MULTICAST> mtu 1450 qdisc noqueue state DOWN group default
link/ether 7e:35:6d:f9:50:c3 brd ff:ff:ff:ff:ff:ff
7: cni0: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 8a:1b:ab:4c:83:c9 brd ff:ff:ff:ff:ff:ff
inet 10.244.0.1/24 scope global cni0
valid_lft forever preferred_lft forever

6: flannel.1网络没有ip信息, 并且显示DOWN的状态

1
2
3
4
5
[root@k8s-master01 flannel]# ping 10.244.2.6
PING 10.244.2.6 (10.244.2.6) 56(84) bytes of data.
^C
--- 10.244.2.6 ping statistics ---
13 packets transmitted, 0 received, 100% packet loss, time 12004ms
1
2
3
4
5
[root@k8s-node01 ~]# ping 10.244.2.6
PING 10.244.2.6 (10.244.2.6) 56(84) bytes of data.
^C
--- 10.244.2.6 ping statistics ---
36 packets transmitted, 0 received, 100% packet loss, time 35012ms
1
2
3
4
5
6
7
[root@k8s-node02 ~]# ping 10.244.2.6
PING 10.244.2.6 (10.244.2.6) 56(84) bytes of data.
64 bytes from 10.244.2.6: icmp_seq=1 ttl=64 time=0.131 ms
64 bytes from 10.244.2.6: icmp_seq=2 ttl=64 time=0.042 ms
^C
--- 10.244.2.6 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms

一个存在与node2的pod只有node2能ping 通, 其他节点全部超时

解决

方法1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
[root@k8s-node01 ~]# sudo iptables -P INPUT ACCEPT
[root@k8s-node01 ~]# sudo iptables -P OUTPUT ACCEPT
[root@k8s-node01 ~]# sudo iptables -P FORWARD ACCEPT
[root@k8s-node01 ~]# iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination
KUBE-FIREWALL all -- 0.0.0.0/0 0.0.0.0/0

Chain FORWARD (policy ACCEPT)
target prot opt source destination
KUBE-FORWARD all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */
ACCEPT all -- 10.244.0.0/16 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 10.244.0.0/16

Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-FIREWALL all -- 0.0.0.0/0 0.0.0.0/0

Chain DOCKER (0 references)
target prot opt source destination

Chain DOCKER-ISOLATION-STAGE-1 (0 references)
target prot opt source destination

Chain DOCKER-ISOLATION-STAGE-2 (0 references)
target prot opt source destination

Chain DOCKER-USER (0 references)
target prot opt source destination

Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-FORWARD (1 references)
target prot opt source destination
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT all -- 10.244.0.0/16 0.0.0.0/0 /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT all -- 0.0.0.0/0 10.244.0.0/16 /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
[root@k8s-node01 ~]# service iptables save
iptables: Saving firewall rules to /etc/sysconfig/iptables:[ 确定 ]

清理IPTABLES规则, 保存
问题没有解决 使用方法二

方法2

卸载flannel网络

1
2
3
4
5
6
7
8
9
10
#第一步,在master节点删除flannel
kubectl delete -f kube-flannel.yml

#第二步,在node节点清理flannel网络留下的文件
ifconfig cni0 down
ip link delete cni0
ifconfig flannel.1 down
ip link delete flannel.1
rm -rf /var/lib/cni/
rm -f /etc/cni/net.d/*

重新部署Flannel网络

1
2
3
4
5
6
7
8
9
10
11
[root@k8s-master01 flannel]# kubectl create -f kube-flannel.yml 
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@k8s-master01 flannel]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-8bpdd 1/1 Running 0 17s
coredns-5c98db65d4-knfcj 1/1 Running 0 43s
etcd-k8s-master01 1/1 Running 2 10d
kube-apiserver-k8s-master01 1/1 Running 2 10d
kube-controller-manager-k8s-master01 1/1 Running 3 10d
kube-flannel-ds-amd64-56hsf 1/1 Running 0 25m
kube-flannel-ds-amd64-56t49 1/1 Running 0 25m
kube-flannel-ds-amd64-qz42z 1/1 Running 0 25m
kube-proxy-5fn9m 1/1 Running 1 10d
kube-proxy-6hjvp 1/1 Running 2 10d
kube-proxy-t47n9 1/1 Running 2 10d
kube-scheduler-k8s-master01 1/1 Running 4 10d
kubernetes-dashboard-7d75c474bb-4r7hc 1/1 Running 0 23m
[root@k8s-master01 flannel]#

重新部署Flannel网络后 容器需要重置, 删除就可以 k8s会重新自动添加

1
2
3
4
5
6
[root@k8s-master01 flannel]# ping 10.244.1.2
PING 10.244.1.2 (10.244.1.2) 56(84) bytes of data.
64 bytes from 10.244.1.2: icmp_seq=1 ttl=63 time=1.04 ms
64 bytes from 10.244.1.2: icmp_seq=2 ttl=63 time=0.498 ms
64 bytes from 10.244.1.2: icmp_seq=3 ttl=63 time=0.575 ms
64 bytes from 10.244.1.2: icmp_seq=4 ttl=63 time=0.578 ms

1
2
3
4
5
6
7
[root@k8s-node01 ~]# ping 10.244.1.2
PING 10.244.1.2 (10.244.1.2) 56(84) bytes of data.
64 bytes from 10.244.1.2: icmp_seq=1 ttl=64 time=0.065 ms
64 bytes from 10.244.1.2: icmp_seq=2 ttl=64 time=0.038 ms
64 bytes from 10.244.1.2: icmp_seq=3 ttl=64 time=0.135 ms
64 bytes from 10.244.1.2: icmp_seq=4 ttl=64 time=0.058 ms
^C
1
2
3
4
5
6
7
[root@k8s-node02 ~]# ping 10.244.1.2
PING 10.244.1.2 (10.244.1.2) 56(84) bytes of data.
64 bytes from 10.244.1.2: icmp_seq=1 ttl=63 time=0.760 ms
64 bytes from 10.244.1.2: icmp_seq=2 ttl=63 time=0.510 ms
64 bytes from 10.244.1.2: icmp_seq=3 ttl=63 time=0.442 ms
64 bytes from 10.244.1.2: icmp_seq=4 ttl=63 time=0.525 ms
^C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[root@k8s-master01 flannel]# ifconfig 
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:1f:d8:95:21 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.50 netmask 255.255.255.0 broadcast 192.168.0.255
inet6 fe80::20c:29ff:fe2c:d1c2 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:2c:d1:c2 txqueuelen 1000 (Ethernet)
RX packets 737868 bytes 493443231 (470.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1656623 bytes 3510224771 (3.2 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.0.0 netmask 255.255.255.255 broadcast 0.0.0.0
ether aa:50:d6:f9:09:e5 txqueuelen 0 (Ethernet)
RX packets 14 bytes 1728 (1.6 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 67 bytes 5973 (5.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1 (Local Loopback)
RX packets 6944750 bytes 1242999056 (1.1 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6944750 bytes 1242999056 (1.1 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

[root@k8s-master01 flannel]#

flannel网络显示正常, 容器之间可以跨主机互通!

CATALOG
  1. 1. 现象
  2. 2. 排查问题
  3. 3. 解决
    1. 3.1. 方法1
    2. 3.2. 方法2