Skip to content

std: print a backtrace on stackoverflow #133170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions library/std/src/sys/pal/unix/stack_overflow.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
pub use self::imp::{cleanup, init};
use self::imp::{drop_handler, make_handler};

#[cfg(any(all(target_os = "linux", target_env = "gnu"), target_os = "macos",))]
mod backtrace;

pub struct Handler {
data: *mut libc::c_void,
}
Expand Down Expand Up @@ -104,6 +107,13 @@ mod imp {
"\nthread '{}' has overflowed its stack\n",
thread::current().name().unwrap_or("<unknown>")
);

#[cfg(any(all(target_os = "linux", target_env = "gnu"), target_os = "macos",))]
{
rtprintpanic!("backtrace:\n\n");
super::backtrace::print();
}

rtabort!("stack overflow");
} else {
// Unregister ourselves by reverting back to the default behavior.
Expand Down
63 changes: 63 additions & 0 deletions library/std/src/sys/pal/unix/stack_overflow/backtrace.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
use crate::ffi::{CStr, c_void};
use crate::mem::MaybeUninit;
use crate::ptr;

/// Prints the current backtrace, even in a signal handlers.
///
/// Since `backtrace` requires locking and memory allocation, it cannot be used
/// from inside a signal handler. Instead, this uses `libunwind` and `dladdr`,
/// even though both of them are not guaranteed to be async-signal-safe, strictly
/// speaking. However, at least LLVM's libunwind (used by macOS) has a [test] for
/// unwinding in signal handlers, and `dladdr` is used by `backtrace_symbols_fd`
/// in glibc, which it [documents] as async-signal-safe.
///
/// In practice, this hack works well enough on GNU/Linux and macOS (and perhaps
/// some other platforms). Realistically, the worst thing that can happen is that
/// the stack overflow occurred inside the dynamic loaded while it holds some sort
/// of lock, which could result in a deadlock if that happens in just the right
/// moment. That's unlikely enough and not the *worst* thing to happen considering
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A deadlock may cause a service to hang for an extended period of time until a health check considers the service dead and forcefully restarts it rather than getting restarted immediately upon crashing.

Copy link
Member

@bjorn3 bjorn3 Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it could prevent a crash reporter from doing it's job.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I should have worded this differently. What I meant was just... the whole stack-overflow module is horribly unsound. Leaving aside the fact that signal handling itself isn't really specified in Rust anyway right now, this module contains awful crimes like mprotecting part of the current stack and using the very much non-AS-safe TLS system in signal handlers. Unfortunately, these hacks are required to ensure that stack allocation itself can't cause UB and to make any attempt at stack overflow messages at all. So while normally, I'm all for sound code, in this module, "does this work well enough" is what counts, not "is this sound", because it very much is not.

The examples you point out are totally realistic. In fact, anything can occur, this is unsound after all. But because of the unlikelihood of actual misbehaviour, I assume that the crimes I committed here are acceptable. If not, there may be ways to work around the misbehaviour (for instance by doing sigalarm and setting a deadline for the backtrace generation to prevent deadlocks). But I'd rather not commit further crimes until we discover any such misbehaviour.

/// that a stack overflow is already an unrecoverable error and most likely
/// indicates a bug.
///
/// [test]: https://github.com/llvm/llvm-project/blob/a6385a3fc8a88f092d07672210a1e773481c2919/libunwind/test/signal_unwind.pass.cpp
/// [documents]: https://www.gnu.org/software/libc/manual/html_node/Backtraces.html#index-backtrace_005fsymbols_005ffd
pub fn print() {
extern "C" fn frame(
ctx: *mut unwind::_Unwind_Context,
arg: *mut c_void,
) -> unwind::_Unwind_Reason_Code {
let count = unsafe { &mut *(arg as *mut usize) };
let depth = *count;
*count += 1;
if depth > 128 {
return unwind::_URC_NO_REASON;
}

let ip = unsafe { unwind::_Unwind_GetIP(ctx) };
let mut info = MaybeUninit::uninit();
let r = unsafe { libc::dladdr(ip.cast(), info.as_mut_ptr()) };
if r != 0 {
let info = unsafe { info.assume_init() };
if !info.dli_sname.is_null() {
let name = unsafe { CStr::from_ptr(info.dli_sname) };
if let Ok(name) = name.to_str() {
rtprintpanic!("{depth}: {}\n", rustc_demangle::demangle(name));
return unwind::_URC_NO_REASON;
}
}
}

rtprintpanic!("{depth}: {ip:p}\n");
unwind::_URC_NO_REASON
}

let mut count = 0usize;
unsafe { unwind::_Unwind_Backtrace(frame, ptr::from_mut(&mut count).cast()) };
if count > 128 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Count will never be > 129 so the number of omitted frames is going to be incorrect anyways. It's going to be a huge number anyways since we are handling a stack overflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Count will never be > 129

Yes, it will be?!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I though the frame function would stop searching after hitting the depth limit but I checked the docs and _URC_NO_REASON will cause it to continue unwinding. I think it should stop at this point since it's possible in certain cases to get "infinite" backtraces with incorrect unwind info (we've had LLVM bugs that caused this before). We definitely want to stop after hitting a certain depth limit, and the total number of frames isn't actually important since stack overflow are often caused by infinite recursion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair enough... I'll change it so it just prints that some frames were omitted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to be a huge number anyways since we are handling a stack overflow.

Most of the time, yep, but I recently got a stack overflow from allocating a very large array in a single stack frame. A backtrace would have been handy there!

rtprintpanic!(
"[... omitted {} frame{} ...]\n",
count - 128,
if count - 128 > 1 { "s" } else { "" }
);
}
}
Loading