Skip to content

Research at St Andrews

Automatic OpenCL device characterization: guiding optimized kernel design

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Peter Thoman, Klaus Kofler, Heiko Studt, John Donald Thomson, Thomas Fahringer

School/Research organisations


The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using a single unified programming interface and language. While the standard guarantees portability of functionality for complying applications and platforms, performance portability on such a diverse set of hardware is limited. Devices may vary significantly in memory architecture as well as type, number and complexity of computational units. To characterize and compare the OpenCL performance of existing and future devices we propose a suite of microbenchmarks, uCLbench.
We present measurements for eight hardware architectures – four GPUs, three CPUs and one accelerator – and illustrate how the results accurately reflect unique characteristics of the respective platform. In addition to measuring quantities traditionally benchmarked on CPUs like arithmetic throughput or the bandwidth and latency of various address spaces, the suite also includes code designed to determine parameters unique to OpenCL like the dynamic branching penalties prevalent on GPUs. We demonstrate how our results can be used to guide algorithm design and optimization for any given platform on an example kernel that represents the key computation of a linear multigrid solver. Guided manual optimization of this kernel results in an average improvement of 61% across the eight platforms tested.


Original languageEnglish
Title of host publicationEuro-Par 2011 Parallel Processing
Subtitle of host publication17th International Conference, Euro-Par 2011, Bordeaux, France, August 29 - September 2, 2011, Proceedings, Part II
Place of PublicationBerlin, Heidelberg
Number of pages15
ISBN (Print)978-3-642-23396-8
Publication statusPublished - 2011

Publication series

NameLecture Notes in Computer Science

Discover related content
Find related publications, people, projects and more using interactive charts.

View graph of relations

Related by author

  1. Lattice-based scheduling for multi-FPGA systems

    Yu, T., Feng, B., Stillwell, M., Guo, L., Ma, Y. & Thomson, J. D., 10 Dec 2018, Proceedings of the International Conference on Field-Programmable Technology 2018, Naha, Okinawa, Japan. IEEE Press

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  2. Large-scale hierarchical k-means for heterogeneous many-core supercomputers

    Li, L., Yu, T., Zhao, W., Fu, H., Wang, C., Tan, L., Yang, G. & Thomson, J., 11 Nov 2018, Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). Piscataway: IEEE Press, 11 p.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  3. Predicting and optimizing image compression

    Murashko, O., Thomson, J. D. & Leather, H., 1 Oct 2016, Proceedings of the 24th ACM International Conference on Multimedia. ACM, p. 665-669

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  4. Milepost GCC: Machine Learning Enabled Self-tuning Compiler

    Fursin, G., Kashnikov, Y., Memon, A., Chamski, Z., Temam, O., Namolaru, M., Yom-Tov, E., Mendelson, B., Zaks, A., Courtois, E., Bodin, F., Barnard, P., Ashton, E., Bonilla, E., Thomson, J. D., Williams, C. & O'Boyle, M., 2011, In : International Journal of Parallel Programming. 39, 3, p. 296-327 32 p.

    Research output: Contribution to journalArticle

  5. Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining

    Fenacci, D., Franke, B. & Thomson, J., 2010, Proceedings of the 13th International Workshop on Software 38; Compilers for Embedded Systems. New York, NY, USA: ACM, p. 5:1-5:10 (SCOPES '10).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

ID: 17727399