[prev in list] [next in list] [prev in thread] [next in thread] 

List:       fedora-devel-list
Subject:    Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change propo
From:       Jeremy Linton <jeremy.linton () arm ! com>
Date:       2022-11-29 18:57:22
Message-ID: 360a80fe-71f8-dc4f-4552-12f77b0bcd0f () arm ! com
[Download RAW message or body]

Hi,


On 6/16/22 15:53, Ben Cotton wrote:
> https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer
> 
> This document represents a proposed Change. As part of the Changes
> process, proposals are publicly announced in order to receive
> community feedback. This proposal will only be implemented if approved
> by the Fedora Engineering Steering Committee.

Given how this just went in the FESCo meeting, I might toss out an 
alternative I've not seen anyone else suggest.

Why not turn this on just for rawhide and leave it off for the main 
distro releases?

That is sorta what happens with the kernel already (extra debug 
options). Its not perfect but rawhide is largely intended to be the dev 
target anyway, so this solves both the "how do I do detailed 
testing/profiling" as well as maintaining peak performance for everyone 
else.


> 
> == Summary ==
> 
> Fedora will add -fno-omit-frame-pointer to the default C/C++
> compilation flags, which will improve the effectiveness of profiling
> and debugging tools.
> 
> == Owner ==
> * Name: [[User:daandemeyer| Daan De Meyer]], [[User:Dcavalca| Davide
> Cavalca]], [[ Andrii Nakryiko]]
> * Email: daandemeyer@fb.com, dcavalca@fb.com, andriin@fb.com
> 
> 
> == Detailed Description ==
> 
> Credits to Mirek Klimos, whose internal note on stacktrace unwinding
> formed the basis for this change proposal (myreggg@gmail.com).
> 
> Any performance or efficiency work relies on accurate profiling data.
> Sampling profilers probe the target program's call stack at regular
> intervals and store the stack traces. If we collect enough of them, we
> can closely approximate the real cost of a library or function with
> minimal runtime overhead.
> 
> Stack trace capture what's running on a thread. It should start with
> clone - if the thread was created via clone syscall - or with _start -
> if it's the main thread of the process. The last function in the stack
> trace is code that CPU is currently executing. If a stack starts with
> [unknown] or any other symbol, it means it's not complete.
> 
> === Unwinding ===
> 
> How does the profiler get the list of function names? There are two parts of it:
> 
> # Unwinding the stack - getting a list of virtual addresses pointing
> to the executable code
> # Symbolization - translating virtual addresses into human-readable
> information, like function name, inlined functions at the address, or
> file name and line number.
> 
> Unwinding is what we're interested in for the purpose of this
> proposal. The important things are:
> 
> * Data on stack is split into frames, each frame belonging to one function.
> * Right before each function call, the return address is put on the
> stack. This is the instruction address in the caller to which we will
> eventually return — and that's what we care about.
> * One register, called the "frame pointer" or "base pointer" register
> (RBP), is traditionally used to point to the beginning of the current
> frame. Every function should back up RBP onto the stack and set it
> properly at the very beginning.
> 
> The "frame pointer" part is achieved by adding push %rbp, mov
> %rsp,%rbp to the beginning of every function and by adding pop %rbp
> before returning. Using this knowledge, stack unwinding boils down to
> traversing a linked list:
> 
> https://i.imgur.com/P6pFdPD.png
> 
> === Where's the catch? ===
> 
> The frame pointer register is not necessary to run a compiled binary.
> It makes it easy to unwind the stack, and some debugging tools rely on
> frame pointers, but the compiler knows how much data it put on the
> stack, so it can generate code that doesn't need the RBP. Not using
> the frame pointer register can make a program more efficient:
> 
> * We don't need to back up the value of the register onto the stack,
> which saves 3 instructions per function.
> * We can treat the RBP as a general-purpose register and use it for
> something else.
> 
> Whether the compiler sets frame pointer or not is controlled by the
> -fomit-frame-pointer flag and the default is "omit", meaning we can't
> use this method of stack unwinding by default.
> 
> To make it possible to rely on the frame pointer being available,
> we'll add -fno-omit-frame-pointer to the default C/C++ compilation
> flags. This will instruct the compiler to make sure the frame pointer
> is always available. This will in turn allow profiling tools to
> provide accurate performance data which can drive performance
> improvements in core libraries and executables.
> 
> == Feedback ==
> 
> === Potential performance impact ===
> 
> * Meta builds all its libraries and executables with
> -fno-omit-frame-pointer by default. Internal benchmarks did not show
> significant impact on performance when omitting the frame pointer for
> two of our most performance intensive applications.
> * Firefox recently landed a change to preserve the frame pointer in
> all jitted code
> (https://bugzilla.mozilla.org/show_bug.cgi?id=1426134). No significant
> decrease in performance was observed.
> * Kernel 4.8 frame pointer benchmarks by Suse showed 5%-10%
> regressions in some benchmarks
> (https://lore.kernel.org/all/20170602104048.jkkzssljsompjdwy@suse.de/T/#u)
> 
> Should individual libraries or executables notice a significant
> performance degradation caused by including the frame pointer
> everywhere, these packages can opt-out on an individual basis as
> described in https://docs.fedoraproject.org/en-US/packaging-guidelines/#_compiler_flags.
> 
> === Alternatives to frame pointers ===
> 
> There are a few alternative ways to unwind stacks instead of using the
> frame pointer:
> 
> * [https://dwarfstd.org DWARF] data - The compiler can emit extra
> information that allows us to find the beginning of the frame without
> the frame pointer, which means we can walk the stack exactly as
> before. The problem is that we need to unwind the stack in kernel
> space which isn't implemented in the kernel. Given that the kernel
> implemented it's own format (ORC) instead of using DWARF, it's
> unlikely that we'll see a DWARF unwinder in the kernel any time soon.
> The perf tool allows you to use the DWARF data with
> --call-graph=dwarf, but this means that it copies the full stack on
> every event and unwinds in user space. This has very high overhead.
> * [https://www.kernel.org/doc/html/v5.3/x86/orc-unwinder.html ORC]
> (undwarf) - problems with unwinding in kernel led to creation of
> another format with the same purpose as DWARF, just much simpler. This
> can only be used to unwind kernel stack traces; it doesn't help us
> with userspace stacks. More information on ORC can be found
> [https://lwn.net/Articles/728339 here].
> * [https://lwn.net/Articles/680985 LBR] - New Intel CPUs have a
> feature that gives you source and target addresses for the last 16 (or
> 32, in newer CPUs) branches with no overhead. It can be configured to
> record only function calls and to be used as a stack, which means it
> can be used to get the stack trace. Sadly, you only get the last X
> calls, and not the full stack trace, so the data can be very
> incomplete. On top of that, many Fedora users might still be using
> CPUs without LBR support which means we wouldn't be able to assume
> working profilers on a Fedora system by default.
> 
> To summarize, if we want complete stacks with reasonably low overhead
> (which we do, there's no other way to get accurate profiling data from
> running services), frame pointers are currently the best option.
> 
> == Benefit to Fedora ==
> 
> Implementing this change will provide profiling tools with easy access
> to stacktraces of installed libraries and executables which will lead
> to more accurate profiling data in general. This in turn can be used
> to implement optimizations to core libraries and executables which
> will improve the overall performance of Fedora itself and the wider
> Linux ecosystem.
> 
> Various debugging tools can also make use of the frame pointer to
> access the current stacktrace, although tools like gdb can already do
> this to some degree via embedded dwarf debugging info.
> 
> == Scope ==
> * Proposal owners: Put up a PR to change the rpm macros to build
> packages by default with -fno-omit-frame-pointer by default.
> 
> * Other developers: Review and merge the PR implementing the Change.
> 
> * Release engineering: [https://pagure.io/releng/issues #Releng issue
> number]. A mass rebuild is required.
> 
> * Policies and guidelines: N/A (not needed for this Change)
> 
> * Trademark approval: N/A (not needed for this Change)
> 
> * Alignment with Objectives: N/A
> 
> == Upgrade/compatibility impact ==
> 
> This should not impact upgrades in any way.
> 
> == How To Test ==
> 
> # Build the package with the updated rpm macros
> # Profile the binary with `perf record -g <binary>`
> # Inspect the perf data with `perf report -g 'graph,0.5,caller'`
> # When expanding hot functions in the perf report, perf should show
> the full call graph of the hot function (at least for all functions
> that are part of the binary compiled with -fno-omit-frame-pointer)
> 
> == User Experience ==
> 
> Fedora users will be more likely to have a streamlined experience when
> trying to debug/profile system executables/libraries. Tools such as
> perf will work out of the box instead of requiring to users to provide
> extra options (e.g. --call-graph=dwarf/LBR) or requiring users to
> recompile all relevant packages with -fno-omit-frame-pointer.
> 
> == Dependencies ==
> 
> The rpm macros for Fedora need to be adjusted to include
> -fno-omit-frame-pointer in the default C/C++ compilation flags.
> 
> == Contingency Plan ==
> 
> * Contingency mechanism: The new version can be released without every
> package being rebuilt with fno-omit-frame-pointer. Profiling will only
> work perfectly once all packages have been rebuilt but there will be
> no regression in behavior if not all packages have been rebuilt by the
> time of the release. If the Change is found to introduce unacceptable
> regressions, the PR implementing it can be reverted and affected
> packages can be rebuilt.
> * Contingency deadline: Final freeze
> * Blocks release? No
> 
> == Documentation ==
> 
> * Original proposal for in-kernel DWARF unwinder (rejected):
> https://lkml.org/lkml/2017/5/5/571
> 
> == Release Notes ==
> 
> Packages are now compiled with frame pointers included by default.
> This will enable a variety of profiling and debugging tools to show
> more information out of the box.
> 
> 
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic