Depurando el kernel desde la línea de comandos con bpftrace

Este es otro artículo de la serie "BPF para los más pequeños" ( 0 , 1 , 2 ) y el primero de una serie de artículos prácticos sobre rastreo de Linux con herramientas modernas.







En él aprenderá sobre el programa y el idioma bpftrace



, la forma más fácil de sumergirse en el mundo de BPF desde un punto de vista práctico, incluso si no sabe absolutamente nada sobre BPF. La utilidad le bpftrace



permite crear programas de manejo usando un lenguaje simple directamente desde la línea de comando y conectarlos a una gran cantidad de eventos de espacio de usuario y kernel. Eche un vistazo al KPDV a continuación ... felicitaciones, ya sabe cómo rastrear las llamadas del sistema con bpftrace



!







A diferencia de los artículos anteriores de la serie, este es relativamente corto y su parte principal está escrita en el formato de un tutorial, por lo que después de cinco minutos de lectura puede crear controladores y recopilar estadísticas sobre cualquier evento en el kernel de Linux directamente desde el comando. línea. El final del artículo analiza las alternativas ply



y el BCC. Los minutos en la frase "después de cinco minutos de lectura" deben ser Mercurial. No se garantiza que surjan habilidades únicas después de cinco minutos de lectura.









Contenido





Instalación de bpftrace



. sudo apt install bpftrace



( ). bpftrace



, .







. bpftrace



, , , , BTF. , bpftrace



.







master- bpftrace docker image bpftrace



. , :







$ docker pull quay.io/iovisor/bpftrace:latest
$ cd /tmp
$ docker run -v $(pwd):/o quay.io/iovisor/bpftrace:latest /bin/bash -c "cp /usr/bin/bpftrace /o"
$ sudo ./bpftrace -V
bpftrace v0.11.4
      
      





bpftrace glibc



, docker image glibc



.







, . , execve(2)



, :







$ sudo ./bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%s -> %s\n", comm, str(uptr(args->filename))); }'
Attaching 1 probe...
bash -> /bin/echo
bash -> /usr/bin/ls
gnome-shell -> /bin/sh
sh -> /home/aspsk/bin/geeqie
sh -> /usr/local/sbin/geeqie
...
      
      





, , , :







$ sudo mv /tmp/bpftrace /usr/local/bin
      
      





, , , BTF, bpftrace



, docker . bpftrace



. .







. bpftrace



/ BTF, kernel headers. , bpftrace



.







BTF, , BTF kernel headers, , , . BTF , .







?



. , , , . , . : « ?!» , ,







$ sudo apt install bpftrace
The following NEW packages will be installed:
  bpftrace
Preparing to unpack .../bpftrace_0.11.0-1_amd64.deb ...
Unpacking bpftrace (0.11.0-1) ...
Setting up bpftrace (0.11.0-1) ...
Processing triggers for man-db (2.9.3-2) ...
$
      
      





, bpftrace



. ? , bpftrace



. - :









, , - , .







Bpftrace: hello world



bpftrace



awk , BEGIN



END



, bpftrace



, . :







# bpftrace -e 'BEGIN { printf("Hello world!\n"); }'
Attaching 1 probe...
Hello world!
^C
      
      





"Hello world!"



. , Ctrl-C



, bpftrace



— . bpftrace



exit



. , END



:







# bpftrace -e '
 BEGIN { printf("Hello world!\n"); exit(); }
 END { printf("So long\n"); }
 '
Attaching 2 probes...
Hello world!
So long
      
      





Kprobes —



— , , , , kprobes (Kernel Probe — ) , , . , , kprobes



, , , , Linux BPF.







, schedule



:







$ sudo bpftrace -e '
    k:schedule { @[comm] = count(); }
    i:s:2 { exit();}
    END { print(@, 10); clear(@); }
'
Attaching 3 probes...
@[Timer]: 147
@[kworker/u65:0]: 147
@[kworker/7:1]: 153
@[kworker/13:1]: 158
@[IPC I/O Child]: 170
@[IPC I/O Parent]: 172
@[kworker/12:1]: 185
@[Web Content]: 229
@[Xorg]: 269
@[SCTP timer]: 1566
      
      





@



.







-? :







$ sudo bpftrace -l 'k:*' | wc -l
61106
      
      





. , , kprobe, , , kprobes.







kretprobes



kprobe



kretprobe



. kprobe



, kretporobe



. retval



.







, vfs_write



( ):







$ sudo bpftrace -e 'kr:vfs_write { @ = hist(retval); } i:s:2 { exit(); }'
Attaching 2 probes...

@:
[1]                  606 |@@@@@@@@@@@@@@@@@@@@@@@@@                           |
[2, 4)                 0 |                                                    |
[4, 8)                 0 |                                                    |
[8, 16)             1223 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32)               0 |                                                    |
[32, 64)              25 |@                                                   |
      
      





uprobes uretprobes



, ( ), . kprobes



. bpftrace



uprobes



uretprobes



— .







, ( UID ):







$ sudo bpftrace -e 'uretprobe:/bin/bash:readline { printf("readline: [%d]: \"%s\"\n", uid, str(uptr(retval))); }'
Attaching 1 probe...
readline: [1000]: "echo "hello habr""
readline: [0]: "echo "hello from root""
^C
      
      





, 2



CONFIG_DEBUG_INFO_BTF=y



, kprobes, , bpf trampolines. BTF , . , / .







Tracepoints —



- , , , . tracepoints . :







$ sudo bpftrace -l 't:*'
      
      





, kprobes



:







$ sudo bpftrace -l 't:*' | wc -l
1782
      
      





tracepoints , API: , tracepoint, , . , bpftrace



, tracepoint, :







$ sudo bpftrace -lv tracepoint:thermal:thermal_temperature
tracepoint:thermal:thermal_temperature
    __data_loc char[] thermal_zone;
    int id;
    int temp_prev;
    int temp;
      
      





kprobe



, BTF, , , . BTF kprobes



kfuncs



.







usdt —



. ( ):







#include <sys/sdt.h>
#include <sys/time.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    struct timeval tv;

    for (;;) {
        gettimeofday(&tv, NULL);
        DTRACE_PROBE1(test, probe, tv.tv_sec);
        sleep(1);
    }
}
      
      





test:probe



, tv.tv_sec



— . , systemtap-sdt-dev



( ). , :







$ cc /tmp/test.c -o /tmp/test
$ sudo bpftrace -l 'usdt:/tmp/test'
usdt:/tmp/test:test:probe
      
      





/tmp/test



,







$ sudo bpftrace -e 'usdt:/tmp/test:test:probe { printf("  %u\n", arg0); }'
Attaching 1 probe...
  1612903991
  1612903992
  1612903993
  1612903994
  1612903995
...
      
      





arg0



tv.tv_sec



, breakpoint.







Perf



bpftrace



, Perf. , bpftrace



:







  • software



    : -
  • hardware



    : PMCs
  • interval



    :
  • profile



    :


`software`



perf, :







# bpftrace -l 's:*'
software:alignment-faults:
software:bpf-output:
software:context-switches:
software:cpu-clock:
software:cpu-migrations:
software:dummy:
software:emulation-faults:
software:major-faults:
software:minor-faults:
software:page-faults:
software:task-clock:
      
      





, , ,







# bpftrace -e 'software:cpu-migrations:10 { @[comm] = count(); }'
Attaching 2 probes...
^C@[kworker/u65:1]: 1
@[bpftrace]: 1
@[SCTP timer]: 2
@[Xorg]: 2
      
      





CPU . perf_event_open(2)



, , cpu-migrations



, man- PERF_COUNT_SW_CPU_MIGRATIONS



.







`hardware`



hardware counters, bpftrace



BPF. , :







bpftrace -l 'h*'
hardware:backend-stalls:
hardware:branch-instructions:
hardware:branch-misses:
hardware:bus-cycles:
hardware:cache-misses:
hardware:cache-references:
hardware:cpu-cycles:
hardware:frontend-stalls:
hardware:instructions:
hardware:ref-cycles:
      
      





( , . PERF_COUNT_HW_BRANCH_MISSES



):







bpftrace -e 'hardware:branch-misses:100000 { @[tid] = count(); }'
Attaching 3 probes...
@[1055]: 4
@[3078]: 4
@[1947]: 5
@[1066]: 6
@[2551]: 6
@[0]: 29
      
      





`interval` `profile`



interval



profile



. CPU, — CPU.







, . , :







$ sudo bpftrace -e '
    kr:vfs_write { @ = hist(retval); }
    interval:s:2 { print(@); clear(@); }
'
      
      





vfs_write



, interval



, , @



.







profile



:







# bpftrace -e '
  profile:hz:99 { @[kstack] = count(); }
  i:s:10 { exit(); }
  END { print(@,1); clear(@); }
'
Attaching 3 probes...
@[
    cpuidle_enter_state+202
    cpuidle_enter+46
    call_cpuidle+35
    do_idle+487
    cpu_startup_entry+32
    start_secondary+345
    secondary_startup_64+182
]: 14455
      
      





profile



99 , exit()



, END



@



— ( , , , ).







Bpftrace: tutorial





, bpftrace



:







# bpftrace
      
      





, . , bpftrace



. , bpftrace



. , — .







-l



. . ( - , , .) :







# bpftrace -l '*kill_all*'
kprobe:rfkill_alloc
kprobe:kill_all
kprobe:btrfs_kill_all_delayed_nodes
      
      





tracepoints



:







# bpftrace -l 't:*kill*'
tracepoint:cfg80211:rdev_rfkill_poll
tracepoint:syscalls:sys_enter_kill
tracepoint:syscalls:sys_exit_kill
tracepoint:syscalls:sys_enter_tgkill
tracepoint:syscalls:sys_exit_tgkill
tracepoint:syscalls:sys_enter_tkill
tracepoint:syscalls:sys_exit_tkill
      
      





tracepoint:syscalls



, , bpftrace



. X



:







tracepoint:syscalls:sys_enter_X
tracepoint:syscalls:sys_exit_X
      
      





- , , execve(2)



. , , - tracepoint



-v



, :







# bpftrace -lv 't:s*:sys_*_execve'
tracepoint:syscalls:sys_enter_execve
    int __syscall_nr;
    const char * filename;
    const char *const * argv;
    const char *const * envp;
tracepoint:syscalls:sys_exit_execve
    int __syscall_nr;
    long ret;
      
      





( *



, syscalls



sys_enter_execve



sys_exit_execve



). , -lv



args



:







# bpftrace -e '
    t:s*:sys_enter_execve { printf("ENTER: %s\n", str(uptr(args->filename))); }
    t:s*:sys_exit_execve { printf("EXIT: %s: %d\n", comm, args->ret); }
'
Attaching 2 probes...
ENTER: /bin/ls
EXIT: ls: 0
ENTER: /bin/lss
EXIT: bash: -2
      
      





.







args->filename



. , str



, : , uptr



. bpftrace



, . , , bpftrace -lv



, , , execve ( __user



).







comm



, . args->ret



. , " " , . , - , : /bin/ls



exec



0 ( ls



/bin/ls



), /bin/lss



exec



-2



bash



.







. , . , . uptr



, .







`bpftrace`



bpftrace



awk (. 6 The AWK Programming Language



) .







<probe> <filter> { <action> }
      
      





,







# bpftrace -e 'BEGIN { printf("Hello\n") } END { printf("World\n") }'
      
      





<probe>



BEGIN



END



, <action>



printf



.

<filter>



,

,







# bpftrace -e 'p:s:1 /cpu == 0/ { printf("  CPU%d\n", cpu); }'
Attaching 1 probe...
  CPU0
  CPU0
^C
      
      





, CPU 0.







. , /cpu == 0/



?







<filter>



. , write



. — sys_enter_write



sys_exit_write



:







# cat /tmp/write-times.bt
t:syscalls:sys_enter_write {
  @[tid] = nsecs
}

t:syscalls:sys_exit_write /@[tid]/ {
  @write[comm] = sum((nsecs - @[tid]) / 1000);
  delete(@[tid]);
}

END {
  print(@write, 10);
  clear(@write);
  clear(@);
}
      
      





, . :







# bpftrace /tmp/write-times.bt
      
      





, sys_enter_write



, write



@



, tid



.







, sys_exit_write



, /@[tid]/



, . , , - write



. ( ) @write



@[tid]



.







, ^C



, END



, @write



@



, bpftrace



.







. , /@[tid]/



?







:



bpftrace



C , , :?



, ++



, --



. if {} else {}



. while



unroll



( , BPF ). , .







: . $



, :







# bpftrace -e 'BEGIN { $x = 1; printf("%d\n", ++$x); exit(); }'
# bpftrace -e 'BEGIN { if (1) { $x = 1; } printf("%d\n", ++$x); exit(); }'
      
      





— :







# bpftrace -e 'BEGIN { $x = 1; exit(); } END { printf("%d\n", $x); }'
      
      





, , @



. "" @



, write



(@[tid]



). ( bpftrace



— . , , : . , , .)







, : ? bpftrace



. kprobes



, tracepoints



— , tracepoint, Perf, , — . tracepoints



, kprobes



, Perf.







kprobes



arg0



, arg1



, .. , . :







#include <linux/skbuff.h>
#include <linux/ip.h>

k:icmp_echo {
    $skb = (struct sk_buff *) arg0;
    $iphdr = (struct iphdr *) ($skb->head + $skb->network_header);
    @pingstats[ntop($iphdr->saddr), ntop($iphdr->daddr)]++;
}
      
      





, . kprobe



icmp_echo



, ICMPv4 echo request. , arg0



, — sk_buff



, . IP @pingstats



. , , IP ! , kprobe, user space, .







. , kernel headers. sk_buff



iphdr



, . ( BTF, — , .) icmp_echo



sk_buff



$skb



. $skb



, , , , iphdr



. ntop



bpftrace



, IP .







. . uptr



kptr



. : vfs_write



, — struct file



, <linux/fs.h>



. vfs_write



. (Hint: struct file



kretprobe



?)







. bpftrace



, , , sudo



.









ping



— , CPU. count



. :







# bpftrace -e 'p:hz:5000 { @x++; @y = count(); } i:s:10 { exit(); }'
Attaching 2 probes...
@x: 760528
@y: 799002
      
      





10 5000 16 @x



@y



. ++



. count()



, CPU- : CPU , .







bpftrace



, - . — . .







# bpftrace -e 'kr:vfs_write { @ = hist(retval); } i:s:10 { exit() }'
Attaching 2 probes...

@:
[1]             14966407 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2, 4)                 0 |                                                    |
[4, 8)                 0 |                                                    |
[8, 16)             1670 |                                                    |
[16, 32)               0 |                                                    |
[32, 64)             123 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)             0 |                                                    |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)        7531982 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
      
      





vfs_write



, , - ! ( 20 ):







# bpftrace -e '
    kr:vfs_write /retval == 1/ { @[pid, comm] = hist(retval); }
    i:s:10 { exit() }
    END { print(@, 1); clear(@); }'
Attaching 3 probes...
@[133835, dd]:
[1]             14254511 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
      
      





, , retval



. , — . , END



, . ? :







# tr '\000' ' ' < /proc/133835/cmdline
dd if=/dev/zero of=/dev/null bs=1
      
      





. bpftrace



, .







: flame graphs



profile



. . , profile



:







profile:hz:rate
profile:s:rate
profile:ms:rate
profile:us:rate
      
      





HZ, — , , .







, , , :







bpftrace -e 'profile:hz:99 { print(kstack); exit() }'
Attaching 1 probe...

        cpuidle_enter_state+202
        cpuidle_enter+46
        call_cpuidle+35
        do_idle+487
        cpu_startup_entry+32
        rest_init+174
        arch_call_rest_init+14
        start_kernel+1724
        x86_64_start_reservations+36
        x86_64_start_kernel+139
        secondary_startup_64+182
      
      





-.







, , . flame graphs, .







, bpftrace



:







# bpftrace -e 'profile:hz:333 { @[kstack] = count(); } i:s:10 { exit() }' > /tmp/raw
# wc -l /tmp/raw
3374 /tmp/raw
      
      





333 ( kstack



@



, kstack



— ).







FlameGraph :







$ git clone https://github.com/brendangregg/FlameGraph.git
$ cd FlameGraph
$ ./stackcollapse-bpftrace.pl /tmp/raw > /tmp/ready
$ ./flamegraph.pl /tmp/ready > /tmp/kstack.svg
      
      





bpftrace



, ( , gist SVG):













, CPU dd



, /dev/zero



/dev/null



. , .







. bpftrace



. FlameGraph



.









, bpftrace



. , .







BCC:



BCC , BPF — 100 . — , , Netflix, Facebook . . , .







libbcc



libbpf



, -C , . , . .







BCC - . , bpfcc-tools



. . ,







# funccount-bpfcc 'vfs_*' -d 5
Tracing 67 functions for "b'vfs_*'"... Hit Ctrl-C to end.

FUNC                                    COUNT
b'vfs_statfs'                               1
b'vfs_unlink'                               1
b'vfs_lock_file'                            2
b'vfs_fallocate'                           31
b'vfs_statx_fd'                            32
b'vfs_getattr'                             80
b'vfs_getattr_nosec'                       88
b'vfs_open'                               108
b'vfs_statx'                              174
b'vfs_writev'                            2789
b'vfs_write'                             6554
b'vfs_read'                              7363
Detaching...
      
      





vfs_



. -p



, PID . , mplayer



, :







# funccount-bpfcc 'vfs_*' -d 5 -p 29380
Tracing 67 functions for "b'vfs_*'"... Hit Ctrl-C to end.

FUNC                                    COUNT
b'vfs_write'                              208
b'vfs_read'                               629
Detaching...
      
      





BCC



BCC. mutex_lock



mutex_unlock



. , .







#! /usr/bin/python3

from bcc import BPF
from ctypes import c_int
from time import sleep, strftime
from sys import argv

b = BPF(text="""
BPF_PERCPU_ARRAY(mutex_stats, u64, 2);

static inline void inc(int key)
{
    u64 *x = mutex_stats.lookup(&key);
    if (x)
        *x += 1;
}

void do_lock(struct pt_regs *ctx) { inc(0); }
void do_unlock(struct pt_regs *ctx) { inc(1); }
""")

b.attach_kprobe(event="mutex_lock", fn_name="do_lock")
b.attach_kprobe(event="mutex_unlock", fn_name="do_unlock")

print("%-8s%10s%10s" % ("TIME", "LOCKED", "UNLOCKED"))

while 2 * 2 == 4:

    try:
        sleep(1)
    except KeyboardInterrupt:
        exit()

    print("%-8s%10d%10d" % (
          strftime("%H:%M:%S"),
          b["mutex_stats"].sum(0).value,
          b["mutex_stats"].sum(1).value))

    b["mutex_stats"].clear()
      
      





. , — BPF



:







from bcc import BPF
      
      





BPF, . BPF



-C.







BPF_PERCPU_ARRAY(mutex_stats, u64, 2);

static inline void inc(int key)
{
    u64 *x = mutex_stats.lookup(&key);
    if (x)
        *x += 1;
}

void do_lock(struct pt_regs *ctx)   { inc(0); }
void do_unlock(struct pt_regs *ctx) { inc(1); }
      
      





C, , BPF



C.

, mutex_stats



u64



, . , PERCPU , , CPU . inc



, mutex_stats



. .

, do_lock



do_unlock



.







BPF



libllvm



, , . kprobes



:







b.attach_kprobe(event="mutex_lock", fn_name="do_lock")
b.attach_kprobe(event="mutex_unlock", fn_name="do_unlock")
      
      





:







print("%-8s%10s%10s" % ("TIME", "LOCKED", "UNLOCKED"))

while 2 * 2 == 4:

    try:
        sleep(1)
    except KeyboardInterrupt:
        exit()

    print("%-8s%10d%10d" % (
          strftime("%H:%M:%S"),
          b["mutex_stats"].sum(0).value,
          b["mutex_stats"].sum(1).value))

    b["mutex_stats"].clear()
      
      





mutex_stats



. sum



mutex_stats



, CPU:







sum(index) {
    result = 0
      CPU {
        result += cpu->mutex_stats[index]
    }
    return result
}
      
      





. :







$ sudo ./bcc-tool-example
TIME        LOCKED  UNLOCKED
18:06:33     11382     12993
18:06:34     11200     12783
18:06:35     18597     22553
18:06:36     20776     25516
18:06:37     59453     68201
18:06:38     49282     58064
18:06:39     25541     27428
18:06:40     22094     25280
18:06:41      5539      7250
18:06:42      5662      7351
18:06:43      5205      6905
18:06:44      6733      8438
      
      





- 18:06:35 youtube Firefox, youtube 18:06:40 . , , youtube .







, C, libbpf



CO-RE. libbpf



, libllvm



, , . , BCC libbpf



+ CO-RE BCC, . libbpf-tools. BCC to libbpf conversion guide ( ).







Ply: bpftrace



ply



, Tobias Waldekranz 2015 , bpftrace



. awk- BPF, ,







ply 'tracepoint:tcp/tcp_receive_reset {
    printf("saddr:%v port:%v->%v\n", data->saddr, data->sport, data->dport);
}'
      
      





ply



: libc



(). , BPF . , , ply



ply script language -> BPF



.







, ply



, , — ply



, , . ply



, , , , C libbpf



— , ., , Building BPF applications with libbpf-bootstrap.









:









bpftrace



, BCC Linux:







  • The bpftrace One-Liner Tutorial. bpftrace



    , . , , .







  • bpftrace Reference Guide. , bpftrace



    , . , bpftrace



    .







  • BCC Tutorial. bpftrace



    ( libbpf



    ), BCC, BCC Reference Guide, , .







  • Brendan Gregg, «BPF Performance Tools». BPF Linux — BCC. BPF .







  • Brendan Gregg, «Systems Performance: Enterprise and the Cloud, 2nd Edition (2020)». «Systems Performance». : BPF, Solaris, . «BPF Performance Tools» «?», «?», ( BPF ).










All Articles