Just Do IT !

解决Kubernetes1.5.1 coredns报错CrashLoopBackOff

字数统计: 661阅读时长: 3 min
2020/04/21 Share

今天在使用K8s查看pod时发现,coredns出现了CrashLoopBackOff

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@k8s-master01 flannel]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-f9rb7 0/1 CrashLoopBackOff 50 9d
coredns-5c98db65d4-xcd9s 0/1 CrashLoopBackOff 50 9d
etcd-k8s-master01 1/1 Running 2 9d
kube-apiserver-k8s-master01 1/1 Running 2 9d
kube-controller-manager-k8s-master01 1/1 Running 3 9d
kube-flannel-ds-amd64-6h79p 1/1 Running 2 9d
kube-flannel-ds-amd64-bnvtd 1/1 Running 3 9d
kube-flannel-ds-amd64-bsq4j 1/1 Running 2 9d
kube-proxy-5fn9m 1/1 Running 1 9d
kube-proxy-6hjvp 1/1 Running 2 9d
kube-proxy-t47n9 1/1 Running 2 9d
kube-scheduler-k8s-master01 1/1 Running 4 9d

使用kubectl logs命令查看, 报错很奇怪

1
2
3
[root@k8s-master01 ~]# kubectl logs coredns-5c98db65d4-xcd9s -n kube-system
E0413 06:32:09.919666 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
E0413 06:32:09.919666 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host

原因:

查阅k8s官方文档

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
coredns pods 有 CrashLoopBackOff 或者 Error 状态
如果有些节点运行的是旧版本的 Docker,同时启用了 SELinux,您或许会遇到 coredns pods 无法启动的情况。 要解决此问题,您可以尝试以下选项之一:

升级到 Docker 的较新版本。

禁用 SELinux.

修改 coredns 部署以设置 allowPrivilegeEscalation 为 true:

kubectl -n kube-system get deployment coredns -o yaml | \
sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | \
kubectl apply -f -
CoreDNS 处于 CrashLoopBackOff 时的另一个原因是当 Kubernetes 中部署的 CoreDNS Pod 检测 到环路时。有许多解决方法 可以避免在每次 CoreDNS 监测到循环并退出时,Kubernetes 尝试重启 CoreDNS Pod 的情况。

警告:
警告:禁用 SELinux 或设置 allowPrivilegeEscalation 为 true 可能会损害集群的安全性。

我这里的原因可能是以前配置iptables时产生的

解决

  1. 设置iptables为空规则
    iptables -F && service iptables save
  2. 删除报错的coredns pod
    1
    2
    3
    4
    5
    6
    [root@k8s-master01 flannel]# kubectl delete pod coredns-5c98db65d4-xcd9s
    Error from server (NotFound): pods "coredns-5c98db65d4-xcd9s" not found
    [root@k8s-master01 flannel]# kubectl delete pod coredns-5c98db65d4-xcd9s -n kube-system
    pod "coredns-5c98db65d4-xcd9s" deleted
    [root@k8s-master01 flannel]# kubectl delete pod coredns-5c98db65d4-f9rb7 -n kube-system
    pod "coredns-5c98db65d4-f9rb7" deleted

重新查看pod

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@k8s-master01 flannel]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-54j5c 1/1 Running 0 13m
coredns-5c98db65d4-jmvbf 1/1 Running 0 14m
etcd-k8s-master01 1/1 Running 2 9d
kube-apiserver-k8s-master01 1/1 Running 2 9d
kube-controller-manager-k8s-master01 1/1 Running 3 9d
kube-flannel-ds-amd64-6h79p 1/1 Running 2 9d
kube-flannel-ds-amd64-bnvtd 1/1 Running 3 9d
kube-flannel-ds-amd64-bsq4j 1/1 Running 2 9d
kube-proxy-5fn9m 1/1 Running 1 9d
kube-proxy-6hjvp 1/1 Running 2 9d
kube-proxy-t47n9 1/1 Running 2 9d
kube-scheduler-k8s-master01 1/1 Running 4 9d
[root@k8s-master01 flannel]#

状态重新变成Running

CATALOG
  1. 1. 原因:
  2. 2. 解决