From kde-core-devel Thu May 31 20:27:19 2001 From: Waldo Bastian Date: Thu, 31 May 2001 20:27:19 +0000 To: kde-core-devel Subject: Fwd: ELF prelinking (was: Linking speed for C++) X-MARC-Message: https://marc.info/?l=kde-core-devel&m=99134101312638 FYI ---------- Forwarded Message ---------- Subject: ELF prelinking (was: Linking speed for C++) Date: Thu, 31 May 2001 11:24:01 -0400 From: Jakub Jelinek To: gcc@gcc.gnu.org, binutils@sources.redhat.com, libc-alpha@sources.redhat.com Cc: bastian@suse.com Hi! This is just a heads up. I've spent last two weeks working on ELF prelinking. So far, I have a version which works more less on ia32 (no ports to other arches done yet) and which will need some more work also to switch to interface so that one binary can prelink 32bit and 64bit ELF and some prelink cache management work (at the moment it prelinks one library at a time, the desired mode of operation is that it is given a list of library and list of binary directories and prelinks all suitable libraries found in the first set of directories and afterwards prelinks all binaries to those libraries). Today I've finally managed to get prelinked konqueror working (konqueror is IMHO a typical example of application which has zillions of shared libraries and spents an awful lot of time in the dynamic linker). Here are the results: non-prelinked konqueror with more less vanilla libraries (the only change was removing DT_RPATHs, so that it uses all libraries from current directory): time DISPLAY= LD_LIBRARY_PATH=. ./konqueror konq: cannot connect to X server real 0m0.510s user 0m0.510s sys 0m0.000s time DISPLAY= LD_LIBRARY_PATH=. LD_BIND_NOW=1 ./konqueror konq: cannot connect to X server real 0m0.680s user 0m0.670s sys 0m0.010s prelinked konqueror: time DISPLAY= LD_LIBRARY_PATH=. ./konqueror konq: cannot connect to X server real 0m0.011s user 0m0.000s sys 0m0.010s time DISPLAY= LD_LIBRARY_PATH=. ./konqueror konq: cannot connect to X server real 0m0.011s user 0m0.000s sys 0m0.010s (it prelinking succeeds (ie. no dependant library has been changed after prelinking and the dynamic linker successfully mapped them at the VMAs they were assigned (ie. l_addr is 0 for all link_map's in the global searchlist)), there is no difference between lazy binding and not lazy binding, all PLT slots are resolved, so in addition to the lazy non-prelinked -> lazy prelinked startup difference there is some additional time saving as no PLT slots need to be resolved afterwards). Each of those two konq binaries had .interp section hacked up, so that it picks the dynamic linker from the current directory, so conditions are equal. The time results are after a few invocations and represent values which have been seen on average (though I have not bothered to compute actual mean value). All measurement have been done with DISPLAY=, so that konqueror bails quickly after reaching main - I did not want to measure time spent after dynamic linker has done its work, since prelinking cannot help there. Here are some statistics from the dynamic linker: non-prelinked (number of relocations is slightly less than actually relocated, since relocations in the initial ld-linux.so.2 resolving are not accounted for (and ld-linux.so.2 is not resolved at all in the prelinked case if it gets mapped to the expected place by kernel)): LD_DEBUG=statistics DISPLAY= LD_LIBRARY_PATH=. ./konqueror 18109: 18109: runtime linker statistics: 18109: total startup time in dynamic loader: 326413721 clock cycles 18109: time needed for relocation: 324016759 clock cycles (99.2) 18109: number of relocations: 52556 18109: time needed to load objects: 2128053 clock cycles (.6) konq: cannot connect to X server LD_DEBUG=statistics DISPLAY= LD_LIBRARY_PATH=. LD_BIND_NOW=1 ./konqueror 18053: 18053: runtime linker statistics: 18053: total startup time in dynamic loader: 450621135 clock cycles 18053: time needed for relocation: 448334619 clock cycles (99.4) 18053: number of relocations: 69929 18053: time needed to load objects: 2077294 clock cycles (.4) konq: cannot connect to X server prelinked (number of relocations is not computed for conflicts, but there are exactly 1224 conflicts, so 1224 relocations have been done): LD_DEBUG=statistics DISPLAY= LD_LIBRARY_PATH=. ./konqueror 18045: 18045: runtime linker statistics: 18045: total startup time in dynamic loader: 3434978 clock cycles 18045: time needed for relocation: 1192856 clock cycles (34.7) 18045: number of relocations: 0 18045: time needed to load objects: 2039037 clock cycles (59.3) konq: cannot connect to X server LD_DEBUG=statistics DISPLAY= LD_LIBRARY_PATH=. LD_BIND_NOW=1 ./konqueror 18059: 18059: runtime linker statistics: 18059: total startup time in dynamic loader: 3444051 clock cycles 18059: time needed for relocation: 1219291 clock cycles (35.4) 18059: number of relocations: 0 18059: time needed to load objects: 2021662 clock cycles (58.7) konq: cannot connect to X server Note that with DISPLAY variable set, konqueror starts up in both cases, so prelinking must work (I don't claim there are no bugs, but of course will spent a lot of time testing etc.). From these numbers, it looks to me like prelinking is a thing worth doing. The prelinking program uses libelf, not bfd, it would be very hard to do this in bfd. Prelinking is done partly by glibc, partly by the prelinking program: glibc has a special mode in which it prints all symbol lookups and also conflicts (symbol lookups which are different in originating library's local scope and global scope), the program then uses this information, adjusts library's VMA and all things dependant on it (I did not bother with debugging sections content yet), on REL architectures converts REL to RELA (I have no other idea how to make sure extern char buffer[]; char **x = &buffer[10]; works right in prelinked DSOs on REL architectures) and prelinks. For binaries, it also writes .gnu.liblist and .gnu.conflict sections (the former is basically a copy of the global searchlist at prelink time, with SONAME, checksum and timestamp recorded for each library, the latter is a collection of ElfW(Rela) entries against .dynsym[0] symbol (ie. 0) which dynamic linker replays if liblist matches). For prelinking, I need at least some minimal help from the static linker though. The minimal requirement is to reserve a few entries at the end of .dynamic section (or after .dynamic and before the next section (usually .sbss or .bss)), for shared libraries I need at least 3 ElfW(Dyn) slots (DT_CHECKSUM, DT_GNU_TIMESTAMP, DT_RELCOUNT resp. DT_RELACOUNT), for binaries I need at least 5 ElfW(Dyn) slots (DT_GNU_CONFLICT{,SZ}, DT_GNU_LIBLIST{,SZ}, DT_REL{,A}COUNT). Do you think ld could do this (wasting 40 resp. 80 bytes (the latter for elf64) if not prelinking does not look like a killer)? For binaries, I have bigger problems, as I cannot insert or expand sections in readonly segment at will. Perhaps it would be good idea to keep some space in the program in between some sections and only if conflict or liblist was large (and thus the expected prelink saving would be huge as well), the prelinker would create new PT_LOAD segment (I would be bad if the time necessary for the kernel to map one more PT_LOAD segment was bigger than time saved by prelinking of tiny binaries). The actions needed to prelink a (not yet prelinked) binary: - grow .dynstr for SONAMEs which are not present in DT_NEEDED/DT_FILTER/DT_AUXILIARY tags and are brought in indirectly - grow .rel.* sections (with the exception of .rel.plt) on REL architectures to form a new .gnu.reloc section (size grow 150%). - add .gnu.liblist section - add .gnu.conflict section (a typical binary linked against just libc has at most 10 conflicts) and in addition to this add the 5 .dynamic entries. Alternatively, if static linker created relocation records even for non-statically linked binaries (in a non-SHF_ALLOCed section, as they wouldn't be used by the dynamic linker), the prelinker could insert/grow the above sections at will even for binaries. Looking for comments on this... I'll post the source once I clean it up some more for people to comment on. Jakub ------------------------------------------------------- -- bastian@kde.org | SuSE Labs KDE Developer | bastian@suse.com