Este es otro artículo de la serie "BPF para los más pequeños" ( 0 , 1 , 2 ) y el primero de una serie de artículos prácticos sobre rastreo de Linux con herramientas modernas.
En él aprenderá sobre el programa y el idioma bpftrace
, la forma más fácil de sumergirse en el mundo de BPF desde un punto de vista práctico, incluso si no sabe absolutamente nada sobre BPF. La utilidad le bpftrace
permite crear programas de manejo usando un lenguaje simple directamente desde la línea de comando y conectarlos a una gran cantidad de eventos de espacio de usuario y kernel. Eche un vistazo al KPDV a continuación ... felicitaciones, ya sabe cómo rastrear las llamadas del sistema con bpftrace
!
A diferencia de los artículos anteriores de la serie, este es relativamente corto y su parte principal está escrita en el formato de un tutorial, por lo que después de cinco minutos de lectura puede crear controladores y recopilar estadísticas sobre cualquier evento en el kernel de Linux directamente desde el comando. línea. El final del artículo analiza las alternativas ply
y el BCC. Los minutos en la frase "después de cinco minutos de lectura" deben ser Mercurial. No se garantiza que surjan habilidades únicas después de cinco minutos de lectura.
Contenido
- Instalación de bpftrace
- ¿Qué eventos podemos rastrear?
- Bpftrace: tutorial
- BCC: utilidades y marco
- Ply: bpftrace para los pobres
Instalación de bpftrace
. sudo apt install bpftrace
( ). bpftrace
, .
. bpftrace
, , , , BTF. , bpftrace
.
master- bpftrace docker image bpftrace
. , :
$ docker pull quay.io/iovisor/bpftrace:latest
$ cd /tmp
$ docker run -v $(pwd):/o quay.io/iovisor/bpftrace:latest /bin/bash -c "cp /usr/bin/bpftrace /o"
$ sudo ./bpftrace -V
bpftrace v0.11.4
bpftrace glibc
, docker image glibc
.
, . , execve(2)
, :
$ sudo ./bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%s -> %s\n", comm, str(uptr(args->filename))); }'
Attaching 1 probe...
bash -> /bin/echo
bash -> /usr/bin/ls
gnome-shell -> /bin/sh
sh -> /home/aspsk/bin/geeqie
sh -> /usr/local/sbin/geeqie
...
, , , :
$ sudo mv /tmp/bpftrace /usr/local/bin
, , , BTF, bpftrace
, docker . bpftrace
. .
. bpftrace
/ BTF, kernel headers. , bpftrace
.
BTF, , BTF kernel headers, , , . BTF , .
?
. , , , . , . : « ?!» , ,
$ sudo apt install bpftrace The following NEW packages will be installed: bpftrace Preparing to unpack .../bpftrace_0.11.0-1_amd64.deb ... Unpacking bpftrace (0.11.0-1) ... Setting up bpftrace (0.11.0-1) ... Processing triggers for man-db (2.9.3-2) ... $
, bpftrace
. ? , bpftrace
. - :
- :
BEGIN
,END
- , kprobes:
kprobe
,kretprobe
,uprobe
,uretprobe
- , BPF trampolines:
kfunc
,kretfunc
- , tracepoints:
tracepoint
- :
usdt
- , perf:
software
,hardware
,profile
,interval
,watchpoint
Bpftrace: hello world
bpftrace
awk , BEGIN
END
, bpftrace
, . :
# bpftrace -e 'BEGIN { printf("Hello world!\n"); }' Attaching 1 probe... Hello world! ^C
"Hello world!"
. , Ctrl-C
, bpftrace
— . bpftrace
exit
. , END
:
# bpftrace -e ' BEGIN { printf("Hello world!\n"); exit(); } END { printf("So long\n"); } ' Attaching 2 probes... Hello world! So long
Kprobes —
— , , , , kprobes (Kernel Probe — ) , , . , , kprobes
, , , , Linux BPF.
, schedule
:
$ sudo bpftrace -e '
k:schedule { @[comm] = count(); }
i:s:2 { exit();}
END { print(@, 10); clear(@); }
'
Attaching 3 probes...
@[Timer]: 147
@[kworker/u65:0]: 147
@[kworker/7:1]: 153
@[kworker/13:1]: 158
@[IPC I/O Child]: 170
@[IPC I/O Parent]: 172
@[kworker/12:1]: 185
@[Web Content]: 229
@[Xorg]: 269
@[SCTP timer]: 1566
@
.
-? :
$ sudo bpftrace -l 'k:*' | wc -l 61106
. , , kprobe, , , kprobes.
kretprobes
kprobe
kretprobe
. kprobe
, kretporobe
. retval
.
, vfs_write
( ):
$ sudo bpftrace -e 'kr:vfs_write { @ = hist(retval); } i:s:2 { exit(); }' Attaching 2 probes... @: [1] 606 |@@@@@@@@@@@@@@@@@@@@@@@@@ | [2, 4) 0 | | [4, 8) 0 | | [8, 16) 1223 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16, 32) 0 | | [32, 64) 25 |@ |
uprobes uretprobes
, ( ), . kprobes
. bpftrace
uprobes
uretprobes
— .
, ( UID ):
$ sudo bpftrace -e 'uretprobe:/bin/bash:readline { printf("readline: [%d]: \"%s\"\n", uid, str(uptr(retval))); }' Attaching 1 probe... readline: [1000]: "echo "hello habr"" readline: [0]: "echo "hello from root"" ^C
, 2
CONFIG_DEBUG_INFO_BTF=y
, kprobes, , bpf trampolines. BTF , . , / .
Tracepoints —
- , , , . tracepoints . :
$ sudo bpftrace -l 't:*'
, kprobes
:
$ sudo bpftrace -l 't:*' | wc -l 1782
tracepoints , API: , tracepoint, , . , bpftrace
, tracepoint, :
$ sudo bpftrace -lv tracepoint:thermal:thermal_temperature tracepoint:thermal:thermal_temperature __data_loc char[] thermal_zone; int id; int temp_prev; int temp;
kprobe
, BTF, , , . BTF kprobes
kfuncs
.
usdt —
#include <sys/sdt.h> #include <sys/time.h> #include <unistd.h> int main(int argc, char **argv) { struct timeval tv; for (;;) { gettimeofday(&tv, NULL); DTRACE_PROBE1(test, probe, tv.tv_sec); sleep(1); } }
test:probe
, tv.tv_sec
— . , systemtap-sdt-dev
( ). , :
$ cc /tmp/test.c -o /tmp/test $ sudo bpftrace -l 'usdt:/tmp/test' usdt:/tmp/test:test:probe
/tmp/test
,
$ sudo bpftrace -e 'usdt:/tmp/test:test:probe { printf(" %u\n", arg0); }' Attaching 1 probe... 1612903991 1612903992 1612903993 1612903994 1612903995 ...
arg0
— tv.tv_sec
, breakpoint.
Perf
bpftrace
, Perf. , bpftrace
:
software
: -hardware
: PMCsinterval
:profile
:
`software`
perf, :
# bpftrace -l 's:*' software:alignment-faults: software:bpf-output: software:context-switches: software:cpu-clock: software:cpu-migrations: software:dummy: software:emulation-faults: software:major-faults: software:minor-faults: software:page-faults: software:task-clock:
, , ,
# bpftrace -e 'software:cpu-migrations:10 { @[comm] = count(); }' Attaching 2 probes... ^C@[kworker/u65:1]: 1 @[bpftrace]: 1 @[SCTP timer]: 2 @[Xorg]: 2
CPU . perf_event_open(2)
, , cpu-migrations
, man- PERF_COUNT_SW_CPU_MIGRATIONS
.
`hardware`
hardware counters, bpftrace
BPF. , :
bpftrace -l 'h*' hardware:backend-stalls: hardware:branch-instructions: hardware:branch-misses: hardware:bus-cycles: hardware:cache-misses: hardware:cache-references: hardware:cpu-cycles: hardware:frontend-stalls: hardware:instructions: hardware:ref-cycles:
( , . PERF_COUNT_HW_BRANCH_MISSES
):
bpftrace -e 'hardware:branch-misses:100000 { @[tid] = count(); }' Attaching 3 probes... @[1055]: 4 @[3078]: 4 @[1947]: 5 @[1066]: 6 @[2551]: 6 @[0]: 29
`interval` `profile`
interval
profile
. CPU, — CPU.
, . , :
$ sudo bpftrace -e ' kr:vfs_write { @ = hist(retval); } interval:s:2 { print(@); clear(@); } '
vfs_write
, interval
, , @
.
profile
:
# bpftrace -e ' profile:hz:99 { @[kstack] = count(); } i:s:10 { exit(); } END { print(@,1); clear(@); } ' Attaching 3 probes... @[ cpuidle_enter_state+202 cpuidle_enter+46 call_cpuidle+35 do_idle+487 cpu_startup_entry+32 start_secondary+345 secondary_startup_64+182 ]: 14455
profile
99 , exit()
, END
@
— ( , , , ).
Bpftrace: tutorial
, bpftrace
:
# bpftrace
, . , bpftrace
. , bpftrace
. , — .
-l
. . ( - , , .) :
# bpftrace -l '*kill_all*' kprobe:rfkill_alloc kprobe:kill_all kprobe:btrfs_kill_all_delayed_nodes
tracepoints
:
# bpftrace -l 't:*kill*' tracepoint:cfg80211:rdev_rfkill_poll tracepoint:syscalls:sys_enter_kill tracepoint:syscalls:sys_exit_kill tracepoint:syscalls:sys_enter_tgkill tracepoint:syscalls:sys_exit_tgkill tracepoint:syscalls:sys_enter_tkill tracepoint:syscalls:sys_exit_tkill
tracepoint:syscalls
, , bpftrace
. X
:
tracepoint:syscalls:sys_enter_X tracepoint:syscalls:sys_exit_X
- , , execve(2)
. , , - tracepoint
-v
, :
# bpftrace -lv 't:s*:sys_*_execve' tracepoint:syscalls:sys_enter_execve int __syscall_nr; const char * filename; const char *const * argv; const char *const * envp; tracepoint:syscalls:sys_exit_execve int __syscall_nr; long ret;
( *
, syscalls
sys_enter_execve
sys_exit_execve
). , -lv
args
:
# bpftrace -e ' t:s*:sys_enter_execve { printf("ENTER: %s\n", str(uptr(args->filename))); } t:s*:sys_exit_execve { printf("EXIT: %s: %d\n", comm, args->ret); } ' Attaching 2 probes... ENTER: /bin/ls EXIT: ls: 0 ENTER: /bin/lss EXIT: bash: -2
.
args->filename
. , str
, : , uptr
. bpftrace
, . , , bpftrace -lv
, , , execve ( __user
).
comm
, . args->ret
. , " " , . , - , : /bin/ls
exec
0 ( ls
/bin/ls
), /bin/lss
exec
-2
bash
.
. , . , . uptr
, .
`bpftrace`
bpftrace
awk (. 6 The AWK Programming Language
) .
<probe> <filter> { <action> }
,
# bpftrace -e 'BEGIN { printf("Hello\n") } END { printf("World\n") }'
<probe>
— BEGIN
END
, <action>
— printf
.
<filter>
,
,
# bpftrace -e 'p:s:1 /cpu == 0/ { printf(" CPU%d\n", cpu); }' Attaching 1 probe... CPU0 CPU0 ^C
, CPU 0.
. , /cpu == 0/
?
<filter>
. , write
. — sys_enter_write
sys_exit_write
:
# cat /tmp/write-times.bt t:syscalls:sys_enter_write { @[tid] = nsecs } t:syscalls:sys_exit_write /@[tid]/ { @write[comm] = sum((nsecs - @[tid]) / 1000); delete(@[tid]); } END { print(@write, 10); clear(@write); clear(@); }
, . :
# bpftrace /tmp/write-times.bt
, sys_enter_write
, write
@
, tid
.
, sys_exit_write
, /@[tid]/
, . , , - write
. ( ) @write
@[tid]
.
, ^C
, END
, @write
@
, bpftrace
.
. , /@[tid]/
?
:
bpftrace
C , , :?
, ++
, --
. if {} else {}
. while
unroll
( , BPF ). , .
: . $
, :
# bpftrace -e 'BEGIN { $x = 1; printf("%d\n", ++$x); exit(); }' # bpftrace -e 'BEGIN { if (1) { $x = 1; } printf("%d\n", ++$x); exit(); }'
— :
# bpftrace -e 'BEGIN { $x = 1; exit(); } END { printf("%d\n", $x); }'
, , @
. "" @
, write
(@[tid]
). ( bpftrace
— . , , : . , , .)
, : ? bpftrace
. kprobes
, tracepoints
— , tracepoint, Perf, , — . tracepoints
, kprobes
, Perf.
kprobes
arg0
, arg1
, .. , . :
#include <linux/skbuff.h> #include <linux/ip.h> k:icmp_echo { $skb = (struct sk_buff *) arg0; $iphdr = (struct iphdr *) ($skb->head + $skb->network_header); @pingstats[ntop($iphdr->saddr), ntop($iphdr->daddr)]++; }
, . kprobe
icmp_echo
, ICMPv4 echo request. , arg0
, — sk_buff
, . IP @pingstats
. , , IP ! , kprobe, user space, .
. , kernel headers. sk_buff
iphdr
, . ( BTF, — , .) icmp_echo
sk_buff
$skb
. $skb
, , , , iphdr
. ntop
bpftrace
, IP .
. . uptr
kptr
. : vfs_write
, — struct file
, <linux/fs.h>
. vfs_write
. (Hint: struct file
kretprobe
?)
. bpftrace
, , , sudo
.
ping
— , CPU. count
. :
# bpftrace -e 'p:hz:5000 { @x++; @y = count(); } i:s:10 { exit(); }' Attaching 2 probes... @x: 760528 @y: 799002
10 5000 16 @x
@y
. ++
. count()
, CPU- : CPU , .
bpftrace
, - . — . .
# bpftrace -e 'kr:vfs_write { @ = hist(retval); } i:s:10 { exit() }' Attaching 2 probes... @: [1] 14966407 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [2, 4) 0 | | [4, 8) 0 | | [8, 16) 1670 | | [16, 32) 0 | | [32, 64) 123 | | [64, 128) 0 | | [128, 256) 0 | | [256, 512) 0 | | [512, 1K) 0 | | [1K, 2K) 0 | | [2K, 4K) 0 | | [4K, 8K) 0 | | [8K, 16K) 7531982 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
vfs_write
, , - ! ( 20 ):
# bpftrace -e ' kr:vfs_write /retval == 1/ { @[pid, comm] = hist(retval); } i:s:10 { exit() } END { print(@, 1); clear(@); }' Attaching 3 probes... @[133835, dd]: [1] 14254511 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
, , retval
. , — . , END
, . ? :
# tr '\000' ' ' < /proc/133835/cmdline dd if=/dev/zero of=/dev/null bs=1
. bpftrace
, .
: flame graphs
profile
. . , profile
:
profile:hz:rate profile:s:rate profile:ms:rate profile:us:rate
HZ, — , , .
, , , :
bpftrace -e 'profile:hz:99 { print(kstack); exit() }' Attaching 1 probe... cpuidle_enter_state+202 cpuidle_enter+46 call_cpuidle+35 do_idle+487 cpu_startup_entry+32 rest_init+174 arch_call_rest_init+14 start_kernel+1724 x86_64_start_reservations+36 x86_64_start_kernel+139 secondary_startup_64+182
-.
, , . flame graphs, .
, bpftrace
:
# bpftrace -e 'profile:hz:333 { @[kstack] = count(); } i:s:10 { exit() }' > /tmp/raw # wc -l /tmp/raw 3374 /tmp/raw
333 ( kstack
@
, kstack
— ).
FlameGraph :
$ git clone https://github.com/brendangregg/FlameGraph.git $ cd FlameGraph $ ./stackcollapse-bpftrace.pl /tmp/raw > /tmp/ready $ ./flamegraph.pl /tmp/ready > /tmp/kstack.svg
bpftrace
, ( , gist SVG):
, CPU dd
, /dev/zero
/dev/null
. , .
. bpftrace
. FlameGraph
.
, bpftrace
. , .
BCC:
BCC , BPF — 100 . — , , Netflix, Facebook . . , .
libbcc
libbpf
, -C , . , . .
BCC - . , bpfcc-tools
. . ,
# funccount-bpfcc 'vfs_*' -d 5 Tracing 67 functions for "b'vfs_*'"... Hit Ctrl-C to end. FUNC COUNT b'vfs_statfs' 1 b'vfs_unlink' 1 b'vfs_lock_file' 2 b'vfs_fallocate' 31 b'vfs_statx_fd' 32 b'vfs_getattr' 80 b'vfs_getattr_nosec' 88 b'vfs_open' 108 b'vfs_statx' 174 b'vfs_writev' 2789 b'vfs_write' 6554 b'vfs_read' 7363 Detaching...
vfs_
. -p
, PID . , mplayer
, :
# funccount-bpfcc 'vfs_*' -d 5 -p 29380 Tracing 67 functions for "b'vfs_*'"... Hit Ctrl-C to end. FUNC COUNT b'vfs_write' 208 b'vfs_read' 629 Detaching...
BCC
BCC. mutex_lock
mutex_unlock
. , .
#! /usr/bin/python3
from bcc import BPF
from ctypes import c_int
from time import sleep, strftime
from sys import argv
b = BPF(text="""
BPF_PERCPU_ARRAY(mutex_stats, u64, 2);
static inline void inc(int key)
{
u64 *x = mutex_stats.lookup(&key);
if (x)
*x += 1;
}
void do_lock(struct pt_regs *ctx) { inc(0); }
void do_unlock(struct pt_regs *ctx) { inc(1); }
""")
b.attach_kprobe(event="mutex_lock", fn_name="do_lock")
b.attach_kprobe(event="mutex_unlock", fn_name="do_unlock")
print("%-8s%10s%10s" % ("TIME", "LOCKED", "UNLOCKED"))
while 2 * 2 == 4:
try:
sleep(1)
except KeyboardInterrupt:
exit()
print("%-8s%10d%10d" % (
strftime("%H:%M:%S"),
b["mutex_stats"].sum(0).value,
b["mutex_stats"].sum(1).value))
b["mutex_stats"].clear()
. , — BPF
:
from bcc import BPF
BPF, . BPF
-C.
BPF_PERCPU_ARRAY(mutex_stats, u64, 2); static inline void inc(int key) { u64 *x = mutex_stats.lookup(&key); if (x) *x += 1; } void do_lock(struct pt_regs *ctx) { inc(0); } void do_unlock(struct pt_regs *ctx) { inc(1); }
C, , BPF
C.
, mutex_stats
u64
, . , PERCPU , , CPU . inc
, mutex_stats
. .
, do_lock
do_unlock
.
— BPF
libllvm
, , . kprobes
:
b.attach_kprobe(event="mutex_lock", fn_name="do_lock")
b.attach_kprobe(event="mutex_unlock", fn_name="do_unlock")
:
print("%-8s%10s%10s" % ("TIME", "LOCKED", "UNLOCKED"))
while 2 * 2 == 4:
try:
sleep(1)
except KeyboardInterrupt:
exit()
print("%-8s%10d%10d" % (
strftime("%H:%M:%S"),
b["mutex_stats"].sum(0).value,
b["mutex_stats"].sum(1).value))
b["mutex_stats"].clear()
mutex_stats
. sum
mutex_stats
, CPU:
sum(index) { result = 0 CPU { result += cpu->mutex_stats[index] } return result }
. :
$ sudo ./bcc-tool-example TIME LOCKED UNLOCKED 18:06:33 11382 12993 18:06:34 11200 12783 18:06:35 18597 22553 18:06:36 20776 25516 18:06:37 59453 68201 18:06:38 49282 58064 18:06:39 25541 27428 18:06:40 22094 25280 18:06:41 5539 7250 18:06:42 5662 7351 18:06:43 5205 6905 18:06:44 6733 8438
- 18:06:35 youtube Firefox, youtube 18:06:40 . , , youtube .
, C, libbpf
CO-RE. libbpf
, libllvm
, , . , BCC libbpf
+ CO-RE BCC, . libbpf-tools. BCC to libbpf conversion guide ( ).
Ply: bpftrace
ply
, Tobias Waldekranz 2015 , bpftrace
. awk- BPF, ,
ply 'tracepoint:tcp/tcp_receive_reset { printf("saddr:%v port:%v->%v\n", data->saddr, data->sport, data->dport); }'
ply
: libc
(). , BPF . , , ply
ply script language -> BPF
.
, ply
, , — ply
, , . ply
, , , , C libbpf
— , ., , Building BPF applications with libbpf-bootstrap.
:
bpftrace
, BCC Linux:
The bpftrace One-Liner Tutorial.
bpftrace
, . , , .
bpftrace Reference Guide. ,
bpftrace
, . ,bpftrace
.
BCC Tutorial.
bpftrace
(libbpf
), BCC, BCC Reference Guide, , .
Brendan Gregg, «BPF Performance Tools». BPF Linux — BCC. BPF .
Brendan Gregg, «Systems Performance: Enterprise and the Cloud, 2nd Edition (2020)». «Systems Performance». : BPF, Solaris, . «BPF Performance Tools» «?», «?», ( BPF ).