Skip to content

Research at St Andrews

Milepost GCC: Machine Learning Enabled Self-tuning Compiler

Research output: Contribution to journalArticle

Author(s)

Grigori Fursin, Yuriy Kashnikov, Abdul Memon, Zbigniew Chamski, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric Courtois, Francois Bodin, Phil Barnard, Elton Ashton, Edwin Bonilla, John Donald Thomson, Christopher Williams, Michael O'Boyle

School/Research organisations

Abstract

Tuning compiler optimizations for rapidly evolving hardware makes porting and extending an optimizing compiler for each new platform extremely challenging. Iterative optimization is a popular approach to adapting programs to a new architecture automatically using feedback-directed compilation. However, the large number of evaluations required for each program has prevented iterative compilation from widespread take-up in production compilers. Machine learning has been proposed to tune optimizations across programs systematically but is currently limited to a few transformations, long training phases and critically lacks publicly released, stable tools. Our approach is to develop a modular, extensible, self-tuning optimization infrastructure to automatically learn the best optimizations across multiple programs and architectures based on the correlation between program features, run-time behavior and optimizations. In this paper we describe Milepost GCC, the first publicly-available open-source machine learning-based compiler. It consists of an Interactive Compilation Interface (ICI) and plugins to extract program features and exchange optimization data with the cTuning.org open public repository. It automatically adapts the internal optimization heuristic at function-level granularity to improve execution time, code size and compilation time of a new program on a given architecture. Part of the MILEPOST technology together with low-level ICI-inspired plugin framework is now included in the mainline GCC. We developed machine learning plugins based on probabilistic and transductive approaches to predict good combinations of optimizations. Our preliminary experimental results show that it is possible to automatically reduce the execution time of individual MiBench programs, some by more than a factor of 2, while also improving compilation time and code size. On average we are able to reduce the execution time of the MiBench benchmark suite by 11% for the ARC reconfigurable processor. We also present a realistic multi-objective optimization scenario for Berkeley DB library using Milepost GCC and improve execution time by approximately 17%, while reducing compilation time and code size by 12% and 7% respectively on Intel Xeon processor.
Close

Details

Original languageEnglish
Pages (from-to)296-327
Number of pages32
JournalInternational Journal of Parallel Programming
Volume39
Issue number3
DOIs
Publication statusPublished - 2011

Discover related content
Find related publications, people, projects and more using interactive charts.

View graph of relations

Related by author

  1. Lattice-based scheduling for multi-FPGA systems

    Yu, T., Feng, B., Stillwell, M., Guo, L., Ma, Y. & Thomson, J. D., 10 Dec 2018, Proceedings of the International Conference on Field-Programmable Technology 2018, Naha, Okinawa, Japan. IEEE Press

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  2. Large-scale hierarchical k-means for heterogeneous many-core supercomputers

    Li, L., Yu, T., Zhao, W., Fu, H., Wang, C., Tan, L., Yang, G. & Thomson, J., 11 Nov 2018, Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). Piscataway: IEEE Press, 11 p.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  3. Predicting and optimizing image compression

    Murashko, O., Thomson, J. D. & Leather, H., 1 Oct 2016, Proceedings of the 24th ACM International Conference on Multimedia. ACM, p. 665-669

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  4. Automatic OpenCL device characterization: guiding optimized kernel design

    Thoman, P., Kofler, K., Studt, H., Thomson, J. D. & Fahringer, T., 2011, Euro-Par 2011 Parallel Processing: 17th International Conference, Euro-Par 2011, Bordeaux, France, August 29 - September 2, 2011, Proceedings, Part II. Berlin, Heidelberg: Springer-Verlag, p. 438-452 15 p. (Lecture Notes in Computer Science; vol. 6853/2011).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  5. Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining

    Fenacci, D., Franke, B. & Thomson, J., 2010, Proceedings of the 13th International Workshop on Software 38; Compilers for Embedded Systems. New York, NY, USA: ACM, p. 5:1-5:10 (SCOPES '10).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related by journal

  1. The Missing Link! A new skeleton for evolutionary multi-agent systems in Erlang

    Stypka, J., Turek, W., Byrski, A., Kisiel-Dorohinicki, M., Barwell, A. D., Brown, C. M., Hammond, K. & Janjic, V., Feb 2018, In : International Journal of Parallel Programming. 46, 1, p. 4-22 19 p.

    Research output: Contribution to journalArticle

  2. High-level heterogeneous and hierarchical parallel systems (HLPGPU 2014)

    Brown, C. M., 28 Oct 2015, In : International Journal of Parallel Programming. 43, 5, p. 892-893 2 p.

    Research output: Contribution to journalEditorial

ID: 17727307