Profile Voluntary Context Switches

- (3 min read)

When running benchmarks in Interval-Based-Reclamation, I observed a suspicious Voluntary Context Switches increase using epoch based reclamation. I profiled the program and traced the context switches back to some mutexes in jemalloc.

Stat process with time

I was trying to measure the memory usage peak of different memory reclamation approaches NIL(no reclamation), HPBP(Hazard pointer based), and EBR(epoch based). I used time to read system counters and get the memory consumption.

LD_PRELOAD=/usr/lib/libjemalloc.so /usr/bin/time -v ./bin/main <args>

Note that Interval-Based-Reclamation does not use any memory pools or arenas, so it is required to use jemalloc to reduce page faults.

/usr/bin/time shows context switch information aside from memory's maximal resident size. They are all recorded in the following table.

ops/secMemory Peak(KB)Minor (reclaiming a frame) page faultsVoluntary context switchesInvoluntary context switches
NIL31,118,76110,660,544238,0472,9973,284
HPBR7,091,661554,06476,3721,0353,186
EBR26,629,611782,760243,314211,8963,169
EBR reclamation has far more voluntary context switches than no reclamation(100x) and HPBR(200x).

Source of voluntary context switches

Voluntary context switches includes cases such as the process is waiting for an I/O operator to complete, the process triggers an fault, etc.

In running Interval-Based-Reclamation, each thread is pinned 1:1 onto cpu cores wth no I/O operations. Faults (e.g. page faults) are almost eliminated by using jemalloc as well. Therefore, I suppose that these voluntary context switches are very likely coming from some syscalls.

Count system calls with perf

In order to confirm the hypothesis that the EBR reclamation implementation somehow triggers a lot of syscalls, use perf to count all the system calls.

perf stat -e 'syscalls:sys_enter_*' -p $PID

Result shows that futex has been continuously called. futex

Trace system calls

strace can be used to show system calls as well.

strace --trace=futex -p $PID

From the trace, futex is called from different PCs. futex

Stacktrace the system call with gdb

To identify the PC of futex, use gdb to set a breakpoint on the futex syscall.

(gdb) catch syscall futex

One of the stack traces: gdb

Summary

The observed voluntary context switches are coming from mutexes inside jemalloc. However, why and how it happens is still unknown, which may require a deeper understanding of the implementation of jemalloc and is out of the scope of this post.

Reference