Fluentd: por que es importante ajustar el búfer de salida



Hoy en día, es imposible imaginar un proyecto basado en Kubernetes sin una pila ELK, con la que se guardan los registros tanto de las aplicaciones como de los componentes del sistema del clúster. En nuestra práctica, usamos la pila EFK con Fluentd en lugar de Logstash.



Fluentd — , Cloud Native Computing Foundation, - Kubernetes.



Fluentd Logstash , , Fluentd , .



, EFK , , Kibana . , .





Fluentd DaemonSet ( Kubernetes) stdout /var/log/containers. JSON- ElasticSearch, standalone , . Kibana.



Fluentd , ElasticSearch . , Nginx. :



127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00 +0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0" -


, ElasticSearch , :



{
  "_index": "test-custom-prod-example-2020.01.02",
  "_type": "_doc",
  "_id": "HgGl_nIBR8C-2_33RlQV",
  "_version": 1,
  "_score": 0,
  "_source": {
    "service": "test-custom-prod-example",
    "container_name": "nginx",
    "namespace": "test-prod",
    "@timestamp": "2020-01-14T05:29:47.599052886 00:00",
    "log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00  0900] \"GET / HTTP/1.1\" 200 777 \"-\" \"Opera/12.0\" -",
    "tag": "custom-log"
  }
}

{
  "_index": "test-custom-prod-example-2020.01.02",
  "_type": "_doc",
  "_id": "IgGm_nIBR8C-2_33e2ST",
  "_version": 1,
  "_score": 0,
  "_source": {
    "service": "test-custom-prod-example",
    "container_name": "nginx",
    "namespace": "test-prod",
    "@timestamp": "2020-01-14T05:29:47.599052886 00:00",
    "log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00  0900] \"GET / HTTP/1.1\" 200 777 \"-\" \"Opera/12.0\" -",
    "tag": "custom-log"
  }
}


, .



Fluentd :



2020-01-16 01:46:46 +0000 [warn]: [test-prod] failed to flush the buffer. retry_time=4 next_retry_seconds=2020-01-16 01:46:53 +0000 chunk="59c37fc3fb320608692c352802b973ce" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"http\", :user=>\"elastic\", :password=>\"obfuscated\"}): read timeout reached"


ElasticSearch request_timeout , - . Fluentd ElasticSearch :



2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fc3fb320608692c352802b973ce" 
2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fad241ab300518b936e27200747" 
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fc11f7ab707ca5de72a88321cc2" 
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fb5adb70c06e649d8c108318c9b" 
2020-01-16 01:47:15 +0000 [warn]: [kube-system] retry succeeded. chunk_id="59c37f63a9046e6dff7e9987729be66f"


, ElasticSearch _id . .



Kibana :







. — fluent-plugin-elasticsearch . , ElasticSearch . , -, .



Fluentd, . - ElasticSearch , , . , , , , , Fluentd .



, , , , : , , . , , , , , Fluentd .



:



 <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.test.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 2
        flush_interval 5s
        retry_forever
        retry_max_interval 30
        chunk_limit_size 8M
        queue_limit_length 8
        overflow_action block
      </buffer>


:

chunk_limit_size — , .



  • flush_interval — , .
  • queue_limit_length — .
  • request_timeout — , Fluentd ElasticSearch.


, queue_limit_length chunk_limit_size, « , ». :



2020-01-21 10:22:57 +0000 [warn]: [test-prod] failed to write data into buffer by buffer overflow action=:block


, , , , .



: , , .



chunk_limit_size 32 , ElasticSeacrh , . , , queue_limit_length.



-, request_timeout. , 20 , Fluentd :



2020-01-21 09:55:33 +0000 [warn]: [test-dev] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=20.85753920301795 slow_flush_log_threshold=20.0 plugin_id="postgresql-dev" 


, , slow_flush_log_threshold. request_timeout.



:



  1. request_timeout , ( ). -.
  2. slow_flush_log_threshold. elapsed_time .
  3. request_timeout , elapsed_time, . request_timeout elapsed_time + 50%.
  4. , slow_flush_log_threshold. elapsed_time + 25%.


, , . , , .



, , , :



node-1 node-2 node-3 node-4
/ / / /
failed to flush the buffer 1749/2 694/2 47/0 1121/2
retry succeeded 410/2 205/1 24/0 241/2


, , , . - Fluentd , slow_flush_log_threshold. request_timeout, , .





Fluentd EFK , . , , ElasticSearch , .



:






All Articles