Skip to content

Commit c49d14a

Browse files
author
Walter Erquinigo
committed
[trace][intel pt] Simple detection of infinite decoding loops
The low-level decoder might fall into an infinite decoding loop for various reasons, the simplest being an infinite direct loop reached due to wrong handling of self-modified code in the kernel, e.g. it might reach ``` 0x0A: pause 0x0C: jump to 0x0A ``` In this case, all the code is sequential and requires no packets to be decoded. The low-level decoder would produce an output like the following ``` 0x0A: pause 0x0C: jump to 0x0A 0x0A: pause 0x0C: jump to 0x0A 0x0A: pause 0x0C: jump to 0x0A ... infinite amount of times ``` These cases require stopping the decoder to avoid infinite work and signal this at least as a trace error. - Add a check that breaks decoding of a single PSB once 500k instructions have been decoded since the last packet was processed. - Add a check that looks for infinite loops after certain amount of instructions have been decoded since the last packet was processed. - Add some `settings` properties for tweaking the thresholds of the checks above. This is also nice because it does the basic work needed for future settings. - Add an AnomalyDetector class that inspects the DecodedThread and the libipt decoder in search for anomalies. These anomalies are then signaled as fatal errors in the trace. - Add an ErrorStats class that keeps track of all the errors in a DecodedThread, with a special counter for fatal errors. - Add an entry for decoded thread errors in the `dump info` command. Some notes are added in the code and in the documention of the settings, so please read them. Besides that, I haven't been unable to create a test case in LLVM style, but I've found an anomaly in the thread rust-lang#12 of the trace 72533820-3eb8-4465-b8e4-4e6bf0ccca99 at Meta. We have to figure out how to artificially create traces with this kind of anomalies in LLVM style. With this change, that anomalous thread now shows: ``` (lldb)thread trace dump instructions 12 -e -i 23101 thread rust-lang#12: tid = 8 ...missing instructions 23101: (error) anomalous trace: possible infinite loop detected of size 2 vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2 23100: 0xffffffff81342785 pause vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2 23099: 0xffffffff81342787 jmp 0xffffffff81342785 ; <+5> [inlined] rep_nop at processor.h:13:2 vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2 23098: 0xffffffff81342785 pause vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2 23097: 0xffffffff81342787 jmp 0xffffffff81342785 ; <+5> [inlined] rep_nop at processor.h:13:2 vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2 23096: 0xffffffff81342785 pause vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2 23095: 0xffffffff81342787 jmp 0xffffffff81342785 ; <+5> [inlined] rep_nop at processor.h:13:2 ``` It used to be in an infinite loop where the decoder never stopped. Besides that, the dump info command shows ``` (lldb) thread trace dump info 12 Errors: Number of individual errors: 32 Number of fatal errors: 1 Number of other errors: 31 ``` and in json format ``` (lldb) thread trace dump info 12 -j "errors": { "totalCount": 32, "libiptErrors": {}, "fatalErrors": 1, "otherErrors": 31 } ``` Differential Revision: https://reviews.llvm.org/D136557
1 parent c34de60 commit c49d14a

File tree

11 files changed

+453
-38
lines changed

11 files changed

+453
-38
lines changed

lldb/include/lldb/Core/PluginManager.h

+6-1
Original file line numberDiff line numberDiff line change
@@ -342,7 +342,8 @@ class PluginManager {
342342
llvm::StringRef name, llvm::StringRef description,
343343
TraceCreateInstanceFromBundle create_callback_from_bundle,
344344
TraceCreateInstanceForLiveProcess create_callback_for_live_process,
345-
llvm::StringRef schema);
345+
llvm::StringRef schema,
346+
DebuggerInitializeCallback debugger_init_callback);
346347

347348
static bool
348349
UnregisterPlugin(TraceCreateInstanceFromBundle create_callback);
@@ -487,6 +488,10 @@ class PluginManager {
487488
Debugger &debugger, const lldb::OptionValuePropertiesSP &properties_sp,
488489
ConstString description, bool is_global_property);
489490

491+
static bool CreateSettingForTracePlugin(
492+
Debugger &debugger, const lldb::OptionValuePropertiesSP &properties_sp,
493+
ConstString description, bool is_global_property);
494+
490495
static lldb::OptionValuePropertiesSP
491496
GetSettingForObjectFilePlugin(Debugger &debugger, ConstString setting_name);
492497

lldb/source/Core/PluginManager.cpp

+14-4
Original file line numberDiff line numberDiff line change
@@ -1051,9 +1051,10 @@ struct TraceInstance
10511051
llvm::StringRef name, llvm::StringRef description,
10521052
CallbackType create_callback_from_bundle,
10531053
TraceCreateInstanceForLiveProcess create_callback_for_live_process,
1054-
llvm::StringRef schema)
1054+
llvm::StringRef schema, DebuggerInitializeCallback debugger_init_callback)
10551055
: PluginInstance<TraceCreateInstanceFromBundle>(
1056-
name, description, create_callback_from_bundle),
1056+
name, description, create_callback_from_bundle,
1057+
debugger_init_callback),
10571058
schema(schema),
10581059
create_callback_for_live_process(create_callback_for_live_process) {}
10591060

@@ -1072,10 +1073,10 @@ bool PluginManager::RegisterPlugin(
10721073
llvm::StringRef name, llvm::StringRef description,
10731074
TraceCreateInstanceFromBundle create_callback_from_bundle,
10741075
TraceCreateInstanceForLiveProcess create_callback_for_live_process,
1075-
llvm::StringRef schema) {
1076+
llvm::StringRef schema, DebuggerInitializeCallback debugger_init_callback) {
10761077
return GetTracePluginInstances().RegisterPlugin(
10771078
name, description, create_callback_from_bundle,
1078-
create_callback_for_live_process, schema);
1079+
create_callback_for_live_process, schema, debugger_init_callback);
10791080
}
10801081

10811082
bool PluginManager::UnregisterPlugin(
@@ -1506,6 +1507,7 @@ CreateSettingForPlugin(Debugger &debugger, ConstString plugin_type_name,
15061507
static const char *kDynamicLoaderPluginName("dynamic-loader");
15071508
static const char *kPlatformPluginName("platform");
15081509
static const char *kProcessPluginName("process");
1510+
static const char *kTracePluginName("trace");
15091511
static const char *kObjectFilePluginName("object-file");
15101512
static const char *kSymbolFilePluginName("symbol-file");
15111513
static const char *kJITLoaderPluginName("jit-loader");
@@ -1559,6 +1561,14 @@ bool PluginManager::CreateSettingForProcessPlugin(
15591561
properties_sp, description, is_global_property);
15601562
}
15611563

1564+
bool PluginManager::CreateSettingForTracePlugin(
1565+
Debugger &debugger, const lldb::OptionValuePropertiesSP &properties_sp,
1566+
ConstString description, bool is_global_property) {
1567+
return CreateSettingForPlugin(debugger, ConstString(kTracePluginName),
1568+
ConstString("Settings for trace plug-ins"),
1569+
properties_sp, description, is_global_property);
1570+
}
1571+
15621572
lldb::OptionValuePropertiesSP
15631573
PluginManager::GetSettingForObjectFilePlugin(Debugger &debugger,
15641574
ConstString setting_name) {

lldb/source/Plugins/Trace/intel-pt/CMakeLists.txt

+12-1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,14 @@ lldb_tablegen(TraceIntelPTCommandOptions.inc -gen-lldb-option-defs
1313
SOURCE TraceIntelPTOptions.td
1414
TARGET TraceIntelPTOptionsGen)
1515

16+
lldb_tablegen(TraceIntelPTProperties.inc -gen-lldb-property-defs
17+
SOURCE TraceIntelPTProperties.td
18+
TARGET TraceIntelPTPropertiesGen)
19+
20+
lldb_tablegen(TraceIntelPTPropertiesEnum.inc -gen-lldb-property-enum-defs
21+
SOURCE TraceIntelPTProperties.td
22+
TARGET TraceIntelPTPropertiesEnumGen)
23+
1624
add_lldb_library(lldbPluginTraceIntelPT PLUGIN
1725
CommandObjectTraceStartIntelPT.cpp
1826
DecodedThread.cpp
@@ -38,4 +46,7 @@ add_lldb_library(lldbPluginTraceIntelPT PLUGIN
3846
)
3947

4048

41-
add_dependencies(lldbPluginTraceIntelPT TraceIntelPTOptionsGen)
49+
add_dependencies(lldbPluginTraceIntelPT
50+
TraceIntelPTOptionsGen
51+
TraceIntelPTPropertiesGen
52+
TraceIntelPTPropertiesEnumGen)

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp

+31-6
Original file line numberDiff line numberDiff line change
@@ -170,34 +170,36 @@ DecodedThread::GetNanosecondsRangeByIndex(uint64_t item_index) {
170170
return prev(next_it)->second;
171171
}
172172

173+
uint64_t DecodedThread::GetTotalInstructionCount() const {
174+
return m_insn_count;
175+
}
176+
173177
void DecodedThread::AppendEvent(lldb::TraceEvent event) {
174178
CreateNewTraceItem(lldb::eTraceItemKindEvent).event = event;
175179
m_events_stats.RecordEvent(event);
176180
}
177181

178182
void DecodedThread::AppendInstruction(const pt_insn &insn) {
179183
CreateNewTraceItem(lldb::eTraceItemKindInstruction).load_address = insn.ip;
184+
m_insn_count++;
180185
}
181186

182187
void DecodedThread::AppendError(const IntelPTError &error) {
183188
CreateNewTraceItem(lldb::eTraceItemKindError).error =
184189
ConstString(error.message()).AsCString();
190+
m_error_stats.RecordError(/*fatal=*/false);
185191
}
186192

187-
void DecodedThread::AppendCustomError(StringRef err) {
193+
void DecodedThread::AppendCustomError(StringRef err, bool fatal) {
188194
CreateNewTraceItem(lldb::eTraceItemKindError).error =
189195
ConstString(err).AsCString();
196+
m_error_stats.RecordError(fatal);
190197
}
191198

192199
lldb::TraceEvent DecodedThread::GetEventByIndex(int item_index) const {
193200
return m_item_data[item_index].event;
194201
}
195202

196-
void DecodedThread::LibiptErrorsStats::RecordError(int libipt_error_code) {
197-
libipt_errors_counts[pt_errstr(pt_errcode(libipt_error_code))]++;
198-
total_count++;
199-
}
200-
201203
const DecodedThread::EventsStats &DecodedThread::GetEventsStats() const {
202204
return m_events_stats;
203205
}
@@ -207,6 +209,29 @@ void DecodedThread::EventsStats::RecordEvent(lldb::TraceEvent event) {
207209
total_count++;
208210
}
209211

212+
uint64_t DecodedThread::ErrorStats::GetTotalCount() const {
213+
uint64_t total = 0;
214+
for (const auto &[kind, count] : libipt_errors)
215+
total += count;
216+
217+
return total + other_errors + fatal_errors;
218+
}
219+
220+
void DecodedThread::ErrorStats::RecordError(bool fatal) {
221+
if (fatal)
222+
fatal_errors++;
223+
else
224+
other_errors++;
225+
}
226+
227+
void DecodedThread::ErrorStats::RecordError(int libipt_error_code) {
228+
libipt_errors[pt_errstr(pt_errcode(libipt_error_code))]++;
229+
}
230+
231+
const DecodedThread::ErrorStats &DecodedThread::GetErrorStats() const {
232+
return m_error_stats;
233+
}
234+
210235
lldb::TraceItemKind
211236
DecodedThread::GetItemKindByIndex(uint64_t item_index) const {
212237
return static_cast<lldb::TraceItemKind>(m_item_kinds[item_index]);

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h

+49-13
Original file line numberDiff line numberDiff line change
@@ -61,15 +61,6 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
6161
public:
6262
using TSC = uint64_t;
6363

64-
// Struct holding counts for libipts errors;
65-
struct LibiptErrorsStats {
66-
// libipt error -> count
67-
llvm::DenseMap<const char *, int> libipt_errors_counts;
68-
size_t total_count = 0;
69-
70-
void RecordError(int libipt_error_code);
71-
};
72-
7364
/// A structure that represents a maximal range of trace items associated to
7465
/// the same TSC value.
7566
struct TSCRange {
@@ -125,16 +116,38 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
125116
bool InRange(uint64_t item_index) const;
126117
};
127118

128-
// Struct holding counts for events;
119+
// Struct holding counts for events
129120
struct EventsStats {
130121
/// A count for each individual event kind. We use an unordered map instead
131122
/// of a DenseMap because DenseMap can't understand enums.
132-
std::unordered_map<lldb::TraceEvent, size_t> events_counts;
133-
size_t total_count = 0;
123+
///
124+
/// Note: We can't use DenseMap because lldb::TraceEvent is not
125+
/// automatically handled correctly by DenseMap. We'd need to implement a
126+
/// custom DenseMapInfo struct for TraceEvent and that's a bit too much for
127+
/// such a simple structure.
128+
std::unordered_map<lldb::TraceEvent, uint64_t> events_counts;
129+
uint64_t total_count = 0;
134130

135131
void RecordEvent(lldb::TraceEvent event);
136132
};
137133

134+
// Struct holding counts for errors
135+
struct ErrorStats {
136+
/// The following counters are mutually exclusive
137+
/// \{
138+
uint64_t other_errors = 0;
139+
uint64_t fatal_errors = 0;
140+
// libipt error -> count
141+
llvm::DenseMap<const char *, uint64_t> libipt_errors;
142+
/// \}
143+
144+
uint64_t GetTotalCount() const;
145+
146+
void RecordError(int libipt_error_code);
147+
148+
void RecordError(bool fatal);
149+
};
150+
138151
DecodedThread(
139152
lldb::ThreadSP thread_sp,
140153
const llvm::Optional<LinuxPerfZeroTscConversion> &tsc_conversion);
@@ -194,12 +207,22 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
194207
/// The load address of the instruction at the given index.
195208
lldb::addr_t GetInstructionLoadAddress(uint64_t item_index) const;
196209

210+
/// \return
211+
/// The number of instructions in this trace (not trace items).
212+
uint64_t GetTotalInstructionCount() const;
213+
197214
/// Return an object with statistics of the trace events that happened.
198215
///
199216
/// \return
200217
/// The stats object of all the events.
201218
const EventsStats &GetEventsStats() const;
202219

220+
/// Return an object with statistics of the trace errors that happened.
221+
///
222+
/// \return
223+
/// The stats object of all the events.
224+
const ErrorStats &GetErrorStats() const;
225+
203226
/// The approximate size in bytes used by this instance,
204227
/// including all the already decoded instructions.
205228
size_t CalculateApproximateMemoryUsage() const;
@@ -221,7 +244,14 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
221244
void AppendError(const IntelPTError &error);
222245

223246
/// Append a custom decoding.
224-
void AppendCustomError(llvm::StringRef error);
247+
///
248+
/// \param[in] error
249+
/// The error message.
250+
///
251+
/// \param[in] fatal
252+
/// If \b true, then the whole decoded thread should be discarded because a
253+
/// fatal anomaly has been found.
254+
void AppendCustomError(llvm::StringRef error, bool fatal = false);
225255

226256
/// Append an event.
227257
void AppendEvent(lldb::TraceEvent);
@@ -289,10 +319,16 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
289319
/// TSC -> nanos conversion utility.
290320
llvm::Optional<LinuxPerfZeroTscConversion> m_tsc_conversion;
291321

322+
/// Statistics of all tracing errors.
323+
ErrorStats m_error_stats;
324+
292325
/// Statistics of all tracing events.
293326
EventsStats m_events_stats;
294327
/// Total amount of time spent decoding.
295328
std::chrono::milliseconds m_total_decoding_time{0};
329+
330+
/// Total number of instructions in the trace.
331+
uint64_t m_insn_count = 0;
296332
};
297333

298334
using DecodedThreadSP = std::shared_ptr<DecodedThread>;

0 commit comments

Comments
 (0)