Skip to content

Commit a3c3c66

Browse files
LeviYeoReumIngo Molnar
authored and
Ingo Molnar
committed
perf/core: Fix child_total_time_enabled accounting bug at task exit
The perf events code fails to account for total_time_enabled of inactive events. Here is a failure case for accounting total_time_enabled for CPU PMU events: sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 2s ... armv8_pmuv3_0/event=0x08/: 1138698008 2289429840 2174835740 armv8_pmuv3_1/event=0x08/: 1826791390 1950025700 847648440 ` ` ` ` ` > total_time_running with child ` > total_time_enabled with child > count with child Performance counter stats for 'stress-ng --pthread=2 -t 2s': 1,138,698,008 armv8_pmuv3_0/event=0x08/ (94.99%) 1,826,791,390 armv8_pmuv3_1/event=0x08/ (43.47%) The two events above are opened on two different CPU PMUs, for example, each event is opened for a cluster in an Arm big.LITTLE system, they will never run on the same CPU. In theory, the total enabled time should be same for both events, as two events are opened and closed together. As the result show, the two events' total enabled time including child event is different (2289429840 vs 1950025700). This is because child events are not accounted properly if a event is INACTIVE state when the task exits: perf_event_exit_event() `> perf_remove_from_context() `> __perf_remove_from_context() `> perf_child_detach() -> Accumulate child_total_time_enabled `> list_del_event() -> Update child event's time The problem is the time accumulation happens prior to child event's time updating. Thus, it misses to account the last period's time when the event exits. The perf core layer follows the rule that timekeeping is tied to state change. To address the issue, make __perf_remove_from_context() handle the task exit case by passing 'DETACH_EXIT' to it and invoke perf_event_state() for state alongside with accounting the time. Then, perf_child_detach() populates the time into the parent's time metrics. After this patch, the bug is fixed: sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 10s ... armv8_pmuv3_0/event=0x08/: 15396770398 32157963940 21898169000 armv8_pmuv3_1/event=0x08/: 22428964974 32157963940 10259794940 Performance counter stats for 'stress-ng --pthread=2 -t 10s': 15,396,770,398 armv8_pmuv3_0/event=0x08/ (68.10%) 22,428,964,974 armv8_pmuv3_1/event=0x08/ (31.90%) [ mingo: Clarified the changelog. ] Fixes: ef54c1a ("perf: Rework perf_event_exit_event()") Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Yeoreum Yun <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Leo Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 4e82c87 commit a3c3c66

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

kernel/events/core.c

+9-9
Original file line numberDiff line numberDiff line change
@@ -2451,6 +2451,7 @@ ctx_time_update_event(struct perf_event_context *ctx, struct perf_event *event)
24512451
#define DETACH_GROUP 0x01UL
24522452
#define DETACH_CHILD 0x02UL
24532453
#define DETACH_DEAD 0x04UL
2454+
#define DETACH_EXIT 0x08UL
24542455

24552456
/*
24562457
* Cross CPU call to remove a performance event
@@ -2465,6 +2466,7 @@ __perf_remove_from_context(struct perf_event *event,
24652466
void *info)
24662467
{
24672468
struct perf_event_pmu_context *pmu_ctx = event->pmu_ctx;
2469+
enum perf_event_state state = PERF_EVENT_STATE_OFF;
24682470
unsigned long flags = (unsigned long)info;
24692471

24702472
ctx_time_update(cpuctx, ctx);
@@ -2473,16 +2475,19 @@ __perf_remove_from_context(struct perf_event *event,
24732475
* Ensure event_sched_out() switches to OFF, at the very least
24742476
* this avoids raising perf_pending_task() at this time.
24752477
*/
2476-
if (flags & DETACH_DEAD)
2478+
if (flags & DETACH_EXIT)
2479+
state = PERF_EVENT_STATE_EXIT;
2480+
if (flags & DETACH_DEAD) {
24772481
event->pending_disable = 1;
2482+
state = PERF_EVENT_STATE_DEAD;
2483+
}
24782484
event_sched_out(event, ctx);
2485+
perf_event_set_state(event, min(event->state, state));
24792486
if (flags & DETACH_GROUP)
24802487
perf_group_detach(event);
24812488
if (flags & DETACH_CHILD)
24822489
perf_child_detach(event);
24832490
list_del_event(event, ctx);
2484-
if (flags & DETACH_DEAD)
2485-
event->state = PERF_EVENT_STATE_DEAD;
24862491

24872492
if (!pmu_ctx->nr_events) {
24882493
pmu_ctx->rotate_necessary = 0;
@@ -13731,12 +13736,7 @@ perf_event_exit_event(struct perf_event *event, struct perf_event_context *ctx)
1373113736
mutex_lock(&parent_event->child_mutex);
1373213737
}
1373313738

13734-
perf_remove_from_context(event, detach_flags);
13735-
13736-
raw_spin_lock_irq(&ctx->lock);
13737-
if (event->state > PERF_EVENT_STATE_EXIT)
13738-
perf_event_set_state(event, PERF_EVENT_STATE_EXIT);
13739-
raw_spin_unlock_irq(&ctx->lock);
13739+
perf_remove_from_context(event, detach_flags | DETACH_EXIT);
1374013740

1374113741
/*
1374213742
* Child events can be freed.

0 commit comments

Comments
 (0)