Kubernetes está diseñado para ser robusto y resistente a fallas, y tiene la capacidad de recuperarse automáticamente. ¡Y lo hace todo bien! Sin embargo, los nodos de producción pueden perder la conexión con el clúster o fallar por varias razones. En estos casos, es imperativo que Kubernetes responda rápidamente al incidente.
, pods . , . , , Kubernetes, ?
, Kubernetes , Kubelet Controller Manager:
Kubelet kube-apiserver ,
--node-status-update-frequency
. 10 .
Controller manager Kubelet
–-node-monitor-period
. 5 .
Kubelet
--node-monitor-grace-period
, Controller manager Kubelet . 40 .
:
Kubelet kube-apiserver, -
node-status-update-frequency
= 10 .
.
Controller manager , Kubelet,
--node-monitor-period
= 5 .
Controller manager , , -
--node-monitor-grace-period
40 . Controller manager , NotReady.
Kube Proxy endpoints, pods , pods .
pods, , , (NotReady) 45 .
Kubelet Controller Manager.
Kubernetes , :
-–node-status-update-frequency
1 ( 10 )
--node-monitor-period
1 ( 5 )
--node-monitor-grace-period
4 ( 40 )
, Kubernetes Kind . Kind Cluster , , .
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
kubeadmConfigPatches:
- |
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
nodeStatusUpdateFrequency: 1s
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
controllerManager:
extraArgs:
node-monitor-period: 1s
node-monitor-grace-period: 4s
- role: worker
deployment Nginx, control-plane worker. control-plane pod Ubuntu, Nginx, worker .
#!/bin/bash # create a K8S cluster with Kind kind create cluster --config kind.yaml # create a Ubuntu pod in control-plane Node kubectl run ubuntu --wait=true --image ubuntu --overrides='{"spec": { "nodeName": "kind-control-plane"}}' sleep 30d # untaint control-plane node in order to schedule pods on it kubectl taint node kind-control-plane node-role.kubernetes.io/master- # create Nginx deployment with 2 replicas, one on each node kubectl create deploy ng --image nginx sleep 30 kubectl scale deployment ng --replicas 2 # expose Nginx deployment so that is reachable on port 80 kubectl expose deploy ng --port 80 --type ClusterIP # install curl in Ubuntu pod kubectl exec ubuntu -- bash -c "apt update && apt install -y curl"
Nginx, curl pod Ubuntu, control-plane, endpoints, Nginx .
# test Nginx service access from Ubuntu pod kubectl exec ubuntu -- bash -c 'while true ; do echo "$(date +"%T.%3N") - Status: $(curl -s -o /dev/null -w "%{http_code}" -m 0.2 -i ng)" ; done' # show Nginx service endpoints while true; do gdate +"%T.%3N"; kubectl get endpoints ng -o json | jq '.subsets' | jq '.[] | .addresses' | jq '.[] | .nodeName'; echo "------";done
, , Kind, . , NotReady.
#!/bin/bash # kill Kind worker node echo "Worker down at $(gdate +"%T.%3N")" docker stop kind-worker > /dev/null sleep 15 # show when the node was detected to be down echo "Worker detected in down state by Control Plane at " kubectl get event --field-selector reason=NodeNotReady --sort-by='.lastTimestamp' -oyaml | grep time | tail -n1 # start worker node again docker start kind-worker > /dev/null
, 12:50:22, Controller manager , 12:50:26, 4 .
Worker down at 12:50:22.285 Worker detected in down state by Control Plane at time: "12:50:26Z"
. 12:50:23, . 12:50:26.744 Kube Proxy endpoint, , .
...
12:50:23.115 - Status: 200
12:50:23.141 - Status: 200
12:50:23.161 - Status: 200
12:50:23.190 - Status: 000
12:50:23.245 - Status: 200
12:50:23.269 - Status: 200
12:50:23.291 - Status: 000
12:50:23.503 - Status: 200
12:50:23.520 - Status: 000
12:50:23.738 - Status: 000
12:50:23.954 - Status: 000
12:50:24.166 - Status: 000
12:50:24.385 - Status: 200
12:50:24.407 - Status: 000
12:50:24.623 - Status: 000
12:50:24.839 - Status: 000
12:50:25.053 - Status: 000
12:50:25.276 - Status: 200
12:50:25.294 - Status: 000
12:50:25.509 - Status: 200
12:50:25.525 - Status: 200
12:50:25.541 - Status: 200
12:50:25.556 - Status: 200
12:50:25.575 - Status: 000
12:50:25.793 - Status: 200
12:50:25.809 - Status: 200
12:50:25.826 - Status: 200
12:50:25.847 - Status: 200
12:50:25.867 - Status: 200
12:50:25.890 - Status: 000
12:50:26.110 - Status: 000
12:50:26.325 - Status: 000
12:50:26.549 - Status: 000
12:50:26.604 - Status: 200
12:50:26.669 - Status: 000
12:50:27.108 - Status: 200
12:50:27.135 - Status: 200
12:50:27.162 - Status: 200
12:50:27.188 - Status: 200
...
...
------
12:50:26.523
"kind-control-plane"
"kind-worker"
------
12:50:26.618
"kind-control-plane"
"kind-worker"
------
12:50:26.744
"kind-control-plane"
------
12:50:26.878
"kind-control-plane"
------
...
, Kubernetes . , , Kubernetes , , etcd, 1 . , 1000 , 60000 , etcd etcd.
, , . , .