-
Notifications
You must be signed in to change notification settings - Fork 414
[Docs] Updated Profiling VTR Section in Developer Guide #2605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Updated Profiling VTR Section in Developer Guide #2605
Conversation
Rewrote the existing Profiling VTR section, specifically the one using GNU `gprof` tool. Added another subsection to explain how to use the Linux `perf` tool to profile VPR and visualize its output.
Expectations:
Remaining Issues:"Some checks haven’t completed yet": It appears to be the issue Alex mentioned in #2598 (comment).
Moreover, here is the exact troubleshoot reference from GitHub.
Potential Solutions:
|
``` | ||
- **Option 2** (Recommended): Record and offline analysis | ||
|
||
Use `perf record` to record the profile data and the call graph. (Note: The argument `lbr` for `--call-graph` only works on Intel platforms. If you encounter issues with call graph recording, please refer to the [`perf record` manual](https://perf.wiki.kernel.org/index.php/Latest_Manual_Page_of_perf-record.1) for more information.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are on a non-Intel platform what should you do? Just leave out --call-graph lbr ? Also describe what leaving it out does -- I believe perf still works but becomes more resource intensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are on a non-Intel platform what should you do?
For non-Intel platforms, the argument can be set to fp
which utilizes frame pointer to produce call graph (sometimes might be inaccurate) or dwarf
using debugging information generated by compiler (resource-consuming to generate call graphs from this during profiling).
Q: Would the following changes work? I was worried that it might be too long to read.
Edited:
Use perf record
to record the profile data and the call graph. (Note: By default, perf
uses the frame pointer to generate call graphs, which might produce inaccurate results if the program is highly optimized by the compiler. On Intel platforms, it is recommended to specify lbr
for --call-graph
, as it is less affected by compiler optimizations, does not require specific compiler options, and uses fewer resources, e.g., less disk storage for storing profiling results. On other platforms, use --call-graph dwarf
if available. This requires the compiler to produce debugging information in DWARF format and is resource-intensive. For more information on call graph recording, please refer to the perf record
manual and StackOverflow discussion.)
sudo perf record --call-graph -p <vpr pid> # use `--call-graph lbr` on Intel platforms or `--call-graph dwarf` on other platforms
Thanks! Looks good -- just a couple of suggestions. |
Close #2545.
Rewrote the existing Profiling VTR section, specifically the one using GNU
gprof
tool.Added another subsection to explain how to use the Linux
perf
tool to profile VPR and visualize its output.