¿Cómo aumentar la velocidad de reacción de Kubernetes ante la falla de los nodos del clúster?

Kubernetes está diseñado para ser robusto y resistente a fallas, y tiene la capacidad de recuperarse automáticamente. ¡Y lo hace todo bien! Sin embargo, los nodos de producción pueden perder la conexión con el clúster o fallar por varias razones. En estos casos, es imperativo que Kubernetes responda rápidamente al incidente.





, pods . , . , , Kubernetes, ?





, Kubernetes , Kubelet Controller Manager:





  1. Kubelet kube-apiserver , --node-status-update-frequency



    . 10 .





  2. Controller manager Kubelet –-node-monitor-period



    . 5 .





  3. Kubelet --node-monitor-grace-period



    , Controller manager Kubelet . 40 .





:





  1. Kubelet kube-apiserver, - node-status-update-frequency



    = 10 .





  2. .





  3. Controller manager , Kubelet, --node-monitor-period



    = 5 .





  4. Controller manager , , - --node-monitor-grace-period



    40 . Controller manager , NotReady.





  5. Kube Proxy endpoints, pods , pods .





pods, , , (NotReady) 45 .





Kubelet Controller Manager.





Kubernetes , :





-–node-status-update-frequency



1 ( 10 )





--node-monitor-period



1 ( 5 )





--node-monitor-grace-period



4 ( 40 )





, Kubernetes Kind . Kind Cluster , , .





kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
kubeadmConfigPatches:
- |
  apiVersion: kubelet.config.k8s.io/v1beta1
  kind: KubeletConfiguration
  nodeStatusUpdateFrequency: 1s
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    controllerManager:
        extraArgs:
          node-monitor-period: 1s
          node-monitor-grace-period: 4s
- role: worker
      
      



deployment Nginx, control-plane worker. control-plane pod Ubuntu, Nginx, worker .





#!/bin/bash

# create a K8S cluster with Kind
kind create cluster --config kind.yaml 
# create a Ubuntu pod in control-plane Node
kubectl run ubuntu --wait=true --image ubuntu --overrides='{"spec": { "nodeName": "kind-control-plane"}}' sleep 30d
# untaint control-plane node in order to schedule pods on it
kubectl taint node kind-control-plane node-role.kubernetes.io/master-
# create Nginx deployment with 2 replicas, one on each node
kubectl create deploy ng --image nginx
sleep 30
kubectl scale deployment ng --replicas 2
# expose Nginx deployment so that is reachable on port 80
kubectl expose deploy ng --port 80  --type ClusterIP
# install curl in Ubuntu pod
kubectl exec ubuntu -- bash -c "apt update && apt install -y curl"
      
      



Nginx, curl pod Ubuntu, control-plane, endpoints, Nginx .





# test Nginx service access from Ubuntu pod
kubectl exec ubuntu -- bash -c 'while true ; do echo "$(date +"%T.%3N") - Status: $(curl -s -o /dev/null -w "%{http_code}" -m 0.2 -i ng)" ; done'

# show Nginx service endpoints
while true; do  gdate +"%T.%3N"; kubectl get endpoints ng -o json | jq '.subsets' | jq '.[] | .addresses' | jq '.[] | .nodeName'; echo "------";done

      
      



, , Kind, . , NotReady.





#!/bin/bash

# kill Kind worker node
echo "Worker down at $(gdate +"%T.%3N")"
docker stop kind-worker > /dev/null
sleep 15
# show when the node was detected to be down
echo "Worker detected in down state by Control Plane at "
kubectl get event --field-selector reason=NodeNotReady --sort-by='.lastTimestamp' -oyaml | grep time | tail -n1
# start worker node again
docker start kind-worker > /dev/null

      
      



, 12:50:22, Controller manager , 12:50:26, 4 .





Worker down at 12:50:22.285
Worker detected in down state by Control Plane at
      time: "12:50:26Z"
      
      



. 12:50:23, . 12:50:26.744 Kube Proxy endpoint, , .





...
12:50:23.115 - Status: 200
12:50:23.141 - Status: 200
12:50:23.161 - Status: 200
12:50:23.190 - Status: 000
12:50:23.245 - Status: 200
12:50:23.269 - Status: 200
12:50:23.291 - Status: 000
12:50:23.503 - Status: 200
12:50:23.520 - Status: 000
12:50:23.738 - Status: 000
12:50:23.954 - Status: 000
12:50:24.166 - Status: 000
12:50:24.385 - Status: 200
12:50:24.407 - Status: 000
12:50:24.623 - Status: 000
12:50:24.839 - Status: 000
12:50:25.053 - Status: 000
12:50:25.276 - Status: 200
12:50:25.294 - Status: 000
12:50:25.509 - Status: 200
12:50:25.525 - Status: 200
12:50:25.541 - Status: 200
12:50:25.556 - Status: 200
12:50:25.575 - Status: 000
12:50:25.793 - Status: 200
12:50:25.809 - Status: 200
12:50:25.826 - Status: 200
12:50:25.847 - Status: 200
12:50:25.867 - Status: 200
12:50:25.890 - Status: 000
12:50:26.110 - Status: 000
12:50:26.325 - Status: 000
12:50:26.549 - Status: 000
12:50:26.604 - Status: 200
12:50:26.669 - Status: 000
12:50:27.108 - Status: 200
12:50:27.135 - Status: 200
12:50:27.162 - Status: 200
12:50:27.188 - Status: 200
...
...
------
12:50:26.523
"kind-control-plane"
"kind-worker"
------
12:50:26.618
"kind-control-plane"
"kind-worker"
------
12:50:26.744
"kind-control-plane"
------
12:50:26.878
"kind-control-plane"
------
...
      
      



, Kubernetes . , , Kubernetes , , etcd, 1 . , 1000 , 60000 , etcd etcd.





, , . , .








All Articles