[prev in list] [next in list] [prev in thread] [next in thread] 

List:       debian-devel
Subject:    Re: Packaging of static libraries
From:       Alexander Cherepanov <ch3root () openwall ! com>
Date:       2016-04-18 21:37:09
Message-ID: 57155385.3050104 () openwall ! com
[Download RAW message or body]

On 2016-04-13 15:29, Ian Jackson wrote:
> Adam Borowski writes ("Re: Packaging of static libraries"):
> > On Tue, Apr 12, 2016 at 02:52:33PM +0100, Ian Jackson wrote:
> > > I'm afraid that LTO is probably too dangerous to be used as a
> > > substitute for static linking.  See my comments in the recent LTO
> > > thread here, where I referred to the problem of undefined behaviour,
> > > and pointed at John Regehr's blog.
> > 
> > LTO is no different from just concatenating all source files and making
> > functions static.  If your code blows after this, it is your fault not
> > LTO's.  LTO just allows interprocedural optimizations to work between
> > functions that were originally in different source files.
> 
> This narrative of `fault' has two very serious problems.
> 
> 
> Firstly, it is hopelessly impractical.  As I have already observed
> here:
> 
> Recently we have seen spectacular advances in compiler optimisation.
> Spectacular in that large swathes of existing previously-working code
> have been discovered, by diligent compilers, to be contrary to the
> published C standard, and `optimised' into non-working machine code.
> 
> In fact, it turns out that there is practically no existing C code
> which is correct according to said standards (including C compilers
> themselves).

There is practically no existing code in any language which is correct 
even if you exclude problems with standards. Not sure we can draw many 
useful conclusions from such general statements.

To get something more specific, the paper [1] claims that their tool 
STACK detected UB in 40% of wheezy packages with C/C++ code.

[1] https://pdos.csail.mit.edu/papers/stack:sosp13.pdf
[2] https://css.csail.mit.edu/stack/

> Real existing code does not conform to the rules now being enforced by
> compilers.  Indeed often it can be very hard to write new code which
> does conform to the rules, even if you know what the rules are and
> take great care.

I have an impression that many complaints about problems with UB stem 
from attempts to write some tricky code. Sometimes tricky (or outright 
non-conforming) code is required, e.g., to work around limits of legacy 
API. But in many cases it's just clever code trying to get a bit more 
speed or to save a bit of memory. Clever enough to get into the area 
where some advanced rules apply but not clever enough to obey these rules.

Arguing for safety over speed is somewhat strange then. Why write the 
tricky code in the first place?

> Two examples showing how C has been turned into a puzzle language:
> 
> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_event.c;h=02b39e6da8c65c033c99a22db4784de8d7aeeb7a;hb=HEAD#l458
>  http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_internal.h;h=005fe538c6b5529447185797cc23d898c219e897;hb=HEAD#l294
> 

Why not separate the free list from active watch_slots? Why not have an 
array of flags indicating which slot is which?

If those approaches are deemed unattractive, explicitly stating an 
assumption of flat memory by casting to uintptr_t before the comparison 
doesn't seem very laborious.

> http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg03340.html
> http://lists.xenproject.org/archives/html/xen-devel/2015-11/threads.html#00112

Yeah, there is a bunch of misconceptions there.

1. Type-punning via unions is time-honored tradition described in all 
versions of the C standard. The referenced email even links to DR 283 so 
it's not clear to me why the confusion.

2. The compiler is not free to assume that padding will not be read. It 
could be read as chars (even if you ignore type-punning). You mentioned 
it yourself in other emails. Not that it gives you much.

3. While writing to / reading from dst->p0 you have to consider not only 
the type of p0 but the type of dst too. This is a very practical 
concern. For example, see 
https://twitter.com/johnregehr/status/706868554222723073 .

4. uint8_t is not guaranteed to be one of the character types and, 
hence, is not free to alias everything. See, e.g., 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110#c13 . Not an 
immediate concern but something to keep in mind if you strive for strict 
standard conformance.

I'm not familiar with Xen but why overlay data for 32- and 64-bit cases 
instead of having different structs for them? Why use macros instead of 
functions?

> The second problem is that it is based on the idea that the C
> specification is by definition right and proper.

Whether the C standard is right and proper or not, it's the only 
(somewhat) widely accepted middle ground for now.

> There are two ways to evaluate the the C specification's rightness and
> properness.
> 
> The first is to ask what the the nominal remit of the C standards
> bodies is.  Well, it is and was to standardise existing practice.
> Existing practice was to use C as a kind of portable assembler; the
> programmer was traditionally entitled to do the kind of things which
> are nowadays forbidden.  So the C committee has failed at its
> task. [1]

The task of the committee was to balance several principles. Why many 
(especially in Free Software world) consider being a high-level 
assembler a much more important principle than other ones is not clear 
to me.

> The second is to ask what is most useful.  And there again the C
> committee have clearly failed.

Apparently others disagree.

> We in Debian are in a good position to defend our users from the
> fallout from this problem.  We could change our default compiler
> options to favour safety, and provide more traditional semantics.

Debian (and other distros) have somewhat unusual stakes in the UB debate 
due to the porting needs. A lone developer can choose to support only 
one platform and is free then to complain that C doesn't provide full 
freedom of assembler for this platform. But Debian often takes such 
programs and build them for many other architectures.

As an example consider shifts by a value greater than or equal to the 
width of the left operand. They are UB in C and work differently on 
different CPUs. Will it benefit Debian to declare them 
implementation-defined in C? Probably not. Another example is unaligned 
accesses.

It looks like Debian (and Free Software community in general) should 
strongly favor portability of the standard C over the ability to serve 
as a high-level assembler.

> We would have influence upstream (for example to further advance the
> set of available safety options) if we cared to use it.  But sadly it
> seems that the notion that our most basic and widely-used programming
> language should be one that's fit for programming in is not yet fully
> accepted.
> 
> At the very least we should fiercely resist any further broadening of
> the scope of the C UB problem.

Then the first thing to do is to stop upgrading gcc. Doesn't seem like a 
very practical approach.

Next thing is to add options like -fwrapv or -fno-strict-overflow, 
-fno-delete-null-pointer-checks, -fno-strict-aliasing but is there a 
chance for consensus about it? Doubtful, but who knows...

Perhaps less controversial is fixing UB (and other bugs) in the existing 
code. Several years ago this was hopeless but recently some tools 
emerged that allow to tackle the problem. First of all, sanitizers -- 
ASan, UBSan, MSan, TSan, ... While running everything in valgrind is not 
very convenient, building everything with ASAN seems quite feasible. 
Recent activity related to Debian:

http://balintreczey.hu/blog/progress-report-on-hardened1-linux-amd64-a-potential-debian-port-with-pie-asan-ubsan-and-more/
 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=812782
https://github.com/Mohit7/Debian-ASan

The last project contains a list of several hundreds packages that fail 
to build or run with ASan. Unlike UBSan problems which may or may not 
lead to a bug in an executable now or in future, ASan problems are quite 
real right now.

After the problems found with ASan and UBSan are dealt with, other tools 
could be used to find further problems:

- STACK (mentioned above);

- tis-interpreter -- https://github.com/TrustInSoft/tis-interpreter -- a 
recently released "interpreter for finding subtle bugs in programs 
written in standard C";

- libcrunch -- https://github.com/stephenrkell/libcrunch -- a tool "for 
fast dynamic type checking".

The tools are there, is there will to fix things?..

Perhaps some mixed approach is possible. E.g., disable some 
optimizations by default and reeenable them when tests with ASan etc. 
pass. Or vice versa -- disable some optimization when tests fail to pass 
with ASan enabled.

-- 
Alexander Cherepanov


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic