--------------Boundary-00=_T6P3SVDN1ICILNB521BR Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit Reading Waldo Bastian's text on C++ shared libraires gave me some ideas. Eventually I spent a few nights trying them. The proposed scheme modifies the object files before linking in a way that reduces the number of expensive relocations. Startup times are reduced by 30 to 50%. See the attached files for details. Regards, - Leon Bottou --------------Boundary-00=_T6P3SVDN1ICILNB521BR Content-Type: application/x-gzip; name="objprelink.c.gz" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="objprelink.c.gz" H4sIAB1MXzsAA9Uba3PbNvK7fsVGd3FJRVYcd9pJ49od13Hazjl2xkk/3CQZDkWBFhuK4PEhR5fk v9/uAiABkpLTpP1wmkQmAewDi32TejgBOf8jL0SaZO9gfx/0ZQnRgwcQJ6koIZYFDstFHSXZDYSw koUAEcdJlIisgsUmI4jZCADOZL4pkptlBYcHB4+mcCFkBj/LqpL1iOZfLZOScN0U4QrwMi6EgFLG 1W1YiCPYyBqiMINCLJKyKpJ5XQlIKgizxUNkYiUXSbyBMKkIV50tRAHVUkAlilUJMuabXy5/h19E JoowhRf1PE0iuEgikZUCQqRNI+VSLGC+4eXPkAPC9lIzAc8kIg6rRGZHIBJcUsBaFCXew6GhoRFO AZnywor4LkDmBOQTsjDbQBpWLegMYKTks8pRpnCLiJ/ATRTB/tUh7Ev7ENrLGU6n83iB38lcFNUG sUwejkb/SLIorRcCfiyrBU7NlifuWCLdIVEUmeyuKvA0cQyswTpDsS86Czflw7IKqx6RsLhxxsbI 6Ww5Ho1oNUo9Qd3A7c8lSv4YDo7s8ULkaRjR+KNmPJJZWUG0DAuYVIhcVA6YPa01KMjCFeEYtyIb D64XmVyJFa+sKzpEvJPFZsvisCxR2LRaXdHRxyEe2wIBDMRaJotRHFZh6tnAMbE05T0iO0h36uBe lTf+6ANqQhKDZ++C1QbiHA+lij2ULh7ZdHy/hPEUnHVHGjjeDuXdL5/cX/gEqtlRrDSwjvg+foQd uOAJjBs4POAAjyXAGVl4Ptw7Bhri2yCT6kLhofFcraM9EwaRoiIQGtZGtWxoyRZJvMlwP83C90nl fYeXn9AaFiLG/cGz01enF17pwxtcsJAf1OEEwbPfLs6DYBoEF79d0kXpH326XeJxegd+A3z68uX5 9Svv/bSFT2Lvnvfe9/8UnrOL89Pr8+trr0HDuz0+OCKJlI3wBuRmYRtSy9FzV9FksQqrKcxmM6VR BoRm53Uci+L14XffvyVprcMAnR660Vzf4dqi8sJ8CgoNy3RdapkraDM3RTBfw4ls4em7QlR1kWlK fA62YdRleCM8khhM0NOIcKVsgtbUpWK4PWKe52NXn/HvBP4EUP1f76/f4lf2Fn5EIxdRxSHpBJXB Xo9x681Ye4A3Yw5d66SoagwBVThHAHc9QnQQOHFphWaP9EtQFHUUTDL09hxxJIRRJFKMMBidKByk MlygM+3gRDdT4mFgqHHweLNSqksfItx5pWIRsU7hFfGhDdgQsyFe6V9WiRTKXEQJhmLAMES8rNDT ZVUHpIdBNJstMd6WDKkxufxy8CfeZIbuusRh9IYdcY5RL6swyYh9V+wc5+DbQ5hTEJ+XMqWAjuck I46wKI3r4NvH3wffHvodnChlCvBkVAtiNhKLukBhAAboAkQYLaGs8auZmlKioARaYpJieO6iFe8r sw2w+A6Jq3ADf9SrHGmcpukwv8hWhfLvYG2YAMogUJgqhSmQDnLPMCg8pJMKm87uU3qhT8hKDcql rFNUF4FZUprSSWWAF66CzUVMCZpWqDuIXHHSAvtrFEqKOMomZGOy1RWeWZyhX0NCmWRnlN0oK3CZ wDh7w1mYOo5M3OoFHZy3SyLGkQh1OsL8CO/ZIsI8R39Dh/NmPEN4NGwtSYkpZpKhlhEtAqVN2g6k FzE5Xmjnw85qNHo4wXTX/cDz07Nf0bnD0/MX55dPzy9fwYvT61e8Dijxsn1sWdXz128xSyBfdvB+ /ngKB9Y/94PEVnIN/zyY3hfhe8JFMHGMS9+LAxj8IMwfqxwmHoH4BPPpyEmuiIMgTJObjEyecyno zZfJfylFoj+SfG0995ushxU7WMrbSgbVJhcwUddosJx5OS4ddTBH7Q4YKybIIcUwmEj8xhBRblZz mcJkghel9vWbVYS5dKW8vYWZkwNFm6gGqZTv6txTmH5+9jS4Pr+4OiOfQEenI7OBn4LOz/xu0IFR guERzYFZdLnTBj/JbVbJBiup87EmOB3pO2SQ2Vc5KpBlpyTmyYRZx2FPD00mPu0IKctI70KL2yzw Jwab76DSfy1Ud2DqINKyYYZQMJzldiYQyJnhta8PSHH31DTe2BP7JyigIK8K+o+rlJTcFeFigb64 NKJxJtBmlfbY43x6OGxOkWZNQsSr9G7xfPRmnH2qk4HjBkYfaEAeHLdQtuB09lNMj9Ez0AZ8ssXG EGzZKJwddXKcAqAPgKtnd3oFBnuptGoZlksV/HiK9BuDGHJQ1BGbBC7CNKoqNkEJKm3jGdoYwao5 KKSsjtrZDtwkwyjGitQoNkcVNaQYAbJE3JMD25r0MNkJmlCC0rVAvC0r+c8URn2/1QXQ0uA/04H1 dlarqlKd0Fpc0AbZTuzBia83paqTe5STqrJicHHDEJsXxiVPsdT4Rmu171pRNW21BKh4RB3kYdau beAmRTZ0MYbFdRZ5wyKd+IzQ8KQEYfs4Vx4j5TX1WfVQOkJ3RJygncylTEWY6XRJiVtz2xFai1F7 aY2RsChoZLmoRWM+Z5iZYfy2c6aykpQF08lgEm1UtkZ5UhjBIC4wp1HhVWm7Nu5mxYeOSx5wx0cc HIn+UkTvSB8MnYTSDyctdUK5XWElZbBufIsKIGE3gChh2WC6CaFck+KchjwFmjcOzKWG4TRO3nMC MZ7dZPXMJKyzxSwgNsYWXJyGN7dYfcAtqfXL87Pg9AJD5Ee6urg6fcoXHDX56ilWwmrut8t/BVeX Z+fMRIMl1uzeWJ6UJjEdt5nmNgPsEdFjuFWWxYaG55RFq9xTzQW1k8YO1K2//8j3tbFrzWI5Pegu +96ylDjE+t8o0gvKqTHGyMxJLbnNeM3rS9Y8Yknll1yvoK5iBowJOFYHBB/XaboxhdbMPnptBKNc EfK0eahNqRupGiMfRipc4Sx96+jWJBhmgFQ0MVmPGWzTDZoqzfCwva4rXIOXRz33xzOlTWldWXQ+ x0DscJH3KSw5XphMSBZYAaPHwui1aEft/ChRVFEfdMWgOzqqqB9rqVL6zueiGziJTslHnN9SBJWY 5ZMcSpUVawmTYGimAE+fiGoJ2n45YV197o2xhKVaBM94oYi9oX7RuCHntyGC8EbkIgLV3dBImBwr mdZa1UxC3BY+ciVEh2rFmzpFBbH00iXWtoHsrVJLM8ny2gR0vWVX2rZtsjIENRZCRTCnvjSz62Yx NuzJ8YGd0AAYnfO0Gk60R1cZptq7i8IRsErmrezRVm8WZZjJLMFiFI1as6uxMmwXGYP2mBwWlq43 YB0WiWomFOI/dUKRhHoSlNOZCpNVmiTZr1EsZqYt9w6hiCIWl9Ft0AIyikToAyJ6Xn5MuDA71j2Q I8jx33G+f0LZmNKaD6N+SrMtPCSWp208K7iutRMZqhltbIaqdth615aqweLYY1sSK7tsvWSYksFs lLfU5ioWaKUNS/ThU04RXat65uN6bTP6SV+hYJoArt2Wyg63KLsqCLu63pVQB/wEDoblsC2VmNRk DVQu2qVWYpdak9p3tqn1t3YMwXx0LljbmWAXwc56casNdsj3azwL/VaTVLVV0qmtWtvsE2lNFJ0f xmC0vJVq49kpHVugDib3uTeqnfx4sCRwP8OmkPvTRhdMX0fpnCvNev/E2rEd8dwVLPIm9JmPq5V1 V20/2X6BjJi1yCrqZIyB1+pMOo5cK8OeCeO2Upgxv1/QtglAQOWXDd8rx3ZXsF/pqtaVdlZuIjxk h2rpl/ogajpz29XokAqwnCGARj1oQdbx+fATNKVU184nvr3S6MOT1g11DOdrECscHdx8DsnxwREk PzY6CsmDB65EPnSMxe4MGfV9nbw96i4zSSX+Je5xndu2+YkxuGPIod9FxC4VceztDVgtY+COzf4J tweP2waOHtnbGzL3e25taukz/kUl5NjGYcP3e+BdkdBnqfb5OWh1MdrHYfqIuNY1mSFiylKQpMq7 ty9TjOnUuftpkvbl1gVKeR7oRpr7+TQavjOuakf2jfqHPgrutw4KXTQFF1bMkLMntjbNgZUhmxFw 3ETHT9F7G+0BNGLckiz0EwWzGcvbcoUGsq4oP+bailyqdMuBW/CkOmcTKvSbAjpSWN5RdqoDTWBr fWBai6ouAA3fRCQ92pJpWpH0LDUwHdEOmDM5AMxtSl16K0jPjoFmigGxEue0Ic9TDO98Ch1o30Ue FtEyWNGTsg5XNKGRtoNqoWGxdzCRzOnRoAkpf0FiLApqL9vlG2+sEnZmERdy9Xm5RTergCZutVWn WtmWwYm5Ooa8PyvbWUK+Ct+JhkKYbW7DjS3YIdoGf5vBGOXUE+iJiqKZ3Nrm5oc4mlYLalNVK+6g 2iQMn0NzvQp3k+QFX0DRYNs/SVchJxvW/Z1sNc+7dvLWPhX7S2XCtjZIeKCV9gWEW1EoPxi0GmjI bV0q47gUlVtqWdtA+90EGB7WaGANp5wGdxidgru/u/kmZ97zF1yfu95ie7TibJrzQRvKlKLsZSiE HutorOIpfx+30dr1NH0zbg28TZ7MWFuqmxHcDfUUGywP9bjOkLlbOPC0riowf/Oa3EBlJT48gMPH XUGqONYp6BBBlG/U3Jbq31kbVmptl6JZo8bVOwetEt3t0RwkfaflMj3krnpW4j6EGwLbbtruk+6t CIbNc2c9Sl3yX09fBmdXl6/OL1+9/DjQXL8TQdt5vz4/fXp1efFvvjm7eno+0IbvGQxapltHNvvi 7Mocye4mFiPRp8tlQ0lvzYJqeuy2PoK1izH1VIYCb6dp+5UBv7Tta1cmwJtpufiCRsPfmgzkXffs 6CO1Xbg20m9edKMDL1Dv6Kj2DPLWdbV4qCwDfZCN8/mCOriNE7vK4S8rhbfh7lXE5Y6+Qm/zXJN+ TXcB8rCKlqTRWn6ktVqVy15v4f+kUjfRd1f1+ifKZFV9D7UDCEm/Ku/xeGyHFp6iHpWedfF2K9fh Vz9ah920IUn9GlXtvBJCH2MmrZMzm/i8lAwrqq7/H9Y5fr1ogiEvWtbZO9z7SgV9J5xZp2RWDvZr nbTsZuBlln5G1uLrvNiyhf5QZ3CQ/I53aeTfQp0bBwahNdNvv5q3B3vRrHn6M1CLfn2W6Cp245Ps N8u68tlrkq/t8V1n3paqbs/OacDOzGXvmZ+SLwZMfnO8CeAOEeX03B6HxRR1aezHrts7SZiZl+xN e9mAuwXu+9zJK7/jDrdFUimUiFC6TZhOA8nmWYRZnbc50u6GlO46UUOwfeXzeZhwMj/C3Y1WeOPR s+ywuInMM/8J3qzblwIT87y72SxHtc4vUDBEFtGy8Aj29cHb6TcPv/EHlrk/+UDv5sw/gEfo6DQK 05Aj1uBHOFSnot6r1z+KgO8amfFzA6f5o35cYOP3em/F6tD3iEIf0bHCnjEQ874HPZNkzpK3Uxjv r8dW57b9kU/TxGx+6zEAm9mw7Q+BDgZhsw7wvur3HaPa/GChaX4vpJeiLH+w8Q16def5rAY86s+L W73EOH1V45nC7vGQC9RA00H3ryo8vQJ6CYkp68wCfvnZfSirf8TDr61oLGq1Tx15LdPd+UvNP9rz etTpo2a2MmiDa7LbAr52+P1lbQ8bjfIZuxihW77abs3bJmS5/wOdcGGjpjgAAA== --------------Boundary-00=_T6P3SVDN1ICILNB521BR Content-Type: text/plain; charset="iso-8859-1"; name="README" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="README" Waldo Bastian's document demonstrates that the current g++ implementation generates lots of expensive run-time relocations. This translates into the slow startup of large C++ applications (KDE, StarOffice, etc.). The attached program "objprelink.c" is designed to reduce the problem. Expect startup times 30-50% faster. 1) HOWTO ========= You must first compile objprelink.c as follows: $ gcc -O2 -o objprelink objprelink.c -lbfd -liberty This program must be run on every object file (.o file) that composes the application or shared library. For the KDE packages, for instance, the simplest way consists of first making a regular build. The following commands then fix all object files, and relink all executables and libraries. $ find . -name '*.o' -exec objprelink {} \; $ find . -name '*.lo' -exec touch {} \; $ make Another approach consists in tweaking the Makefiles. That works well for QT. 2) PRINCIPLE ============= The name "objprelink" means that the program must be run before linking shared libraries or executables. I will explain the idea using Waldo's little programs "testclassN.cpp". ----------------------------------------------------------------- testclassN.cpp ----------------------------------------------------------------- #include template class testclass : public QWidget { public: virtual void setSizeIncrement(int w, int h) { QWidget::setSizeIncrement(w+T, h+T); } }; template class testclass<1>; template class testclass<2>; .... // as many as we want. template class testclass; ----------------------------------------------------------------- Let's first compile this program using the regular method. $ g++ -c -I$QTDIR/include testclass1.cpp $ g++ -shared -o testclass1.so testclass1.o -L$QTDIR/lib -lqt The resulting object file "testclass1.o" contains several section. One section contains the virtual table for the class testclass<1>. Here are the relocations for this section: ---------------------------------------------------------------- BEFORE (vtable relocs for testclass<1>) ---------------------------------------------------------------- RELOCATION RECORDS FOR [.gnu.linkonce.d.__vt_t9testclass1i1]: OFFSET TYPE VALUE 00000004 R_386_32 __tft9testclass1i1 00000008 R_386_32 _._t9testclass1i1 0000000c R_386_32 event__7QWidgetP6QEvent 00000010 R_386_32 eventFilter__7QObjectP7QObjectP6QEvent 00000014 R_386_32 metaObject__C7QWidget 00000018 R_386_32 className__C7QWidget 0000001c R_386_32 setName__7QWidgetPCc .... ---------------------------------------------------------------- Each of these relocations require an expensive symbol lookup at run time. There will be a relocation to function QWidget::className(..) in the vtable of every class that inherits QWidget. The same will happen for the 70+ virtual functions defined by QWidget. The "objprelink" program adds one indirection into the vtables. It inserts a stub section for each function appearing in vtables and moves the expensive relocation there: ---------------------------------------------------------------- AFTER (stub for QWidget::className) ---------------------------------------------------------------- DISASSEMBLY OF [.gnu.linkonce.t.stub.className__C7QWidget]: 00000000 <.gnu.linkonce.t.stub.className__C7QWidget>: 0: b8 00 00 00 00 mov $0x0,%eax 1: R_386_32 className__C7QWidget 5: ff e0 jmp *%eax ---------------------------------------------------------------- All the trick is that there is only one such section per function. This section is shared by all the QWidget subclasses defined in this library. The vtable relocs are then modified to point to the stub sections. These relocs will become R_386_RELATIVE in the shared object and will not require a symbol lookup. ---------------------------------------------------------------- AFTER (vtable relocs for testclass<1>) ---------------------------------------------------------------- RELOCATION RECORDS FOR [.gnu.linkonce.d.__vt_t9testclass1i1]: OFFSET TYPE VALUE 00000004 R_386_32 .gnu.linkonce.t.stub.__tft9testclass1i1 00000008 R_386_32 .gnu.linkonce.t.stub._._t9testclass1i1 0000000c R_386_32 .gnu.linkonce.t.stub.event__7QWidgetP6QEvent 00000010 R_386_32 .gnu.linkonce.t.stub.eventFilter__7QObjectP7QObjectP6QEvent 00000014 R_386_32 .gnu.linkonce.t.stub.metaObject__C7QWidget 00000018 R_386_32 .gnu.linkonce.t.stub.className__C7QWidget 0000001c R_386_32 .gnu.linkonce.t.stub.setName__7QWidgetPCc .... ---------------------------------------------------------------- One important point is that "objprelink" does not change the symbol table. Undefined symbols remain undefined. Defined symbols remain defined. It just changes the relocation records without modifying the linking semantic. This is not like option -Bdynamic. 3) RESULTS =========== The following table compares the numbers of relocations in shared libraries generated from regular object files (before the slash) and from fixed object files (after the slash). Figures are provided for some testclassN programs and also for the QT library. ------------------------------------------------------------------------------ R_386_32 R_386_GLOB_DAT R_386_JUMP_SLOT R_386_RELATIVE ------------------------------------------------------------------------------ testclass1.so 106/105 9/9 8/8 3/108 testclass2.so 212/110 13/13 8/8 3/213 testclass5.so 530/125 25/25 8/8 3/528 testclass10.so 1060/150 45/45 8/8 3/1053 testclass20.so 2120/200 85/85 8/8 3/2103 testclass50.so 5300/350 205/205 8/8 3/5253 ------------------------------------------------------------------------------ libqt.so 16915/4563 2690/2690 5039/5039 4933/21669 ------------------------------------------------------------------------------ Basically it transforms a large number of expensive R_386_32 relocations into comparatively cheap R_386_RELATIVE relocations. This is a gain because it reduces the number of symbol lookups during the dynamic loading. The following table gives the execution time of an empty main function dynamically linked with the above shared libraries. Units are milliseconds averaged over one hundred runs. ---------------------------------------------------------------- libqt.so regular regular prelink prelink testclass*.so regular prelink regular prelink ---------------------------------------------------------------- testclass1.so 60 61 41 40 testclass2.so 63 62 40 40 testclass5.so 62 63 41 40 testclass10.so 64 63 43 40 testclass20.so 67 64 45 42 testclass50.so 74 68 54 45 ---------------------------------------------------------------- This shows a 30% improvement when everything gets prelinked. I made a few additional measurements using LD_DEBUG=statistics. These indicate even larger improvements. I am progressively recompiling the C++ library on my system. Yesterday night I recompiled "libqt.so.2.3.1" and installed it. Then I recompiled "libqtcups.so" and observed dramatic speedup in the startup time of "qtcups". These tests provide extensive coverage of the virtual table modifications. My initial plan consisted in using a R_386_PLT relocation in the stub sections. This would buy me lazy symbol binding and even faster startup times. This is trickier than it looks because one should not jump into the PLT without the proper got pointer in %ebx. I have not been able to achieve this with an acceptable overhead. Any ideas ? --------------Boundary-00=_T6P3SVDN1ICILNB521BR-- >> Visit http://master.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<