Proceedings of the Institute for System Programming of the RAS
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Proceedings of the Institute for System Programming of the RAS, 2016, Volume 28, Issue 1, Pages 63–80
DOI: https://doi.org/10.15514/ISPRAS-2016-28(1)-4
(Mi tisp4)
 

Dynamic loader optimization for ARM

E. A. Kudryashov, D. M. Melnik, A. V. Monakov

Institute for System Programming of the Russian Academy of Sciences, 25, Alexander Solzhenitsyn st., Moscow, 109004, Russia
References:
Abstract: The paper discusses an optimization approach for external calls in position-independent code that is based on loading the callee address immediately at the call site from the Global Offset Table (GOT), avoiding the use of the Procedure Linkage Table (PLT). Normally the Linux toolchain creates the PLT both in the main executable (which comprises position-dependent code and has to rely on the PLT mechanism to make external calls) and in shared libraries, where the PLT serves to implement lazy binding of dynamic symbols, but is not required otherwise. However, calls via the PLT have some overhead due to an extra jump instruction and poorer instruction cache locality. On some architectures, binary interface of PLT calls constrains compiler optimization at the call site. It is possible to avoid the overhead of PLT calls by loading the callee address from the GOT at the call site and performing an indirect call, although it prevents lazy symbol resolution and may cause increase in code size. We implement this code generation variant in GCC compiler for x86 and ARM architectures. On ARM, loading the callee address from the GOT at call site normally needs a complex sequence with three load instructions. To improve that, we propose new relocation types that allow to build a PC-relative address of a given GOT slot with a pair of movt, movw instructions, and implement these relocation types in GCC and Binutils (assembler and linker) for both ARM and Thumb-2 modes. Our evaluation results show that proposed optimization yields performance improvements on both x86 (up to 12% improvement with Clang/LLVM built with multiple shared libraries, on big translation units) and ARM (up to 7% improvement with SQLite, average over several tests), even though code size on ARM also grows by 13-15%.
Keywords: program optimizations, dynamic loader, global offset table, procedure linkage table, relocations, ARM.
Bibliographic databases:
Document Type: Article
Language: Russian
Citation: E. A. Kudryashov, D. M. Melnik, A. V. Monakov, “Dynamic loader optimization for ARM”, Proceedings of ISP RAS, 28:1 (2016), 63–80
Citation in format AMSBIB
\Bibitem{KudMelMon16}
\by E.~A.~Kudryashov, D.~M.~Melnik, A.~V.~Monakov
\paper Dynamic loader optimization for ARM
\jour Proceedings of ISP RAS
\yr 2016
\vol 28
\issue 1
\pages 63--80
\mathnet{http://mi.mathnet.ru/tisp4}
\crossref{https://doi.org/10.15514/ISPRAS-2016-28(1)-4}
\elib{https://elibrary.ru/item.asp?id=26166307}
Linking options:
  • https://www.mathnet.ru/eng/tisp4
  • https://www.mathnet.ru/eng/tisp/v28/i1/p63
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Proceedings of the Institute for System Programming of the RAS
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024