Skip to content

Research at St Andrews

Large-scale hierarchical k-means for heterogeneous many-core supercomputers

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Standard

Large-scale hierarchical k-means for heterogeneous many-core supercomputers. / Li, Lideng; Yu, Teng; Zhao, Wenlai; Fu, Haohuan; Wang, Chenyu; Tan, Li; Yang, Guangwen; Thomson, John.

Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). Piscataway : IEEE Press, 2018.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Harvard

Li, L, Yu, T, Zhao, W, Fu, H, Wang, C, Tan, L, Yang, G & Thomson, J 2018, Large-scale hierarchical k-means for heterogeneous many-core supercomputers. in Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, The International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, United States, 11/11/18.

APA

Li, L., Yu, T., Zhao, W., Fu, H., Wang, C., Tan, L., ... Thomson, J. (2018). Large-scale hierarchical k-means for heterogeneous many-core supercomputers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18) Piscataway: IEEE Press.

Vancouver

Li L, Yu T, Zhao W, Fu H, Wang C, Tan L et al. Large-scale hierarchical k-means for heterogeneous many-core supercomputers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). Piscataway: IEEE Press. 2018

Author

Li, Lideng ; Yu, Teng ; Zhao, Wenlai ; Fu, Haohuan ; Wang, Chenyu ; Tan, Li ; Yang, Guangwen ; Thomson, John. / Large-scale hierarchical k-means for heterogeneous many-core supercomputers. Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). Piscataway : IEEE Press, 2018.

Bibtex - Download

@inproceedings{dd552c75e08e471f9367a8f909aa25fb,
title = "Large-scale hierarchical k-means for heterogeneous many-core supercomputers",
abstract = "This paper presents a novel design and implementation of k-means clustering algorithm targeting the Sunway TaihuLight supercomputer. We introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level (nkd) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer. Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation achieves performance of less than 18 seconds per iteration for a largescale clustering case with 196,608 data dimensions and 2,000 centroids by applying 4,096 nodes (1,064,496 cores) in parallel, making k-means a more feasible solution for complex scenarios.",
keywords = "Supercomputer, Multi/many-core Processors, Clustering, Parallel computing",
author = "Lideng Li and Teng Yu and Wenlai Zhao and Haohuan Fu and Chenyu Wang and Li Tan and Guangwen Yang and John Thomson",
note = "Funding: J.Thomson and T.Yu are supported by the EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R010528/1, and EU Horizon 2020 grant Team-Play: ”Time, Energy and security Analysis for Multi/Many-core heterogenous PLAtforms” (ICT-779882, https://teamplay- h2020.eu)",
year = "2018",
month = "11",
day = "11",
language = "English",
booktitle = "Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18)",
publisher = "IEEE Press",

}

RIS (suitable for import to EndNote) - Download

TY - GEN

T1 - Large-scale hierarchical k-means for heterogeneous many-core supercomputers

AU - Li, Lideng

AU - Yu, Teng

AU - Zhao, Wenlai

AU - Fu, Haohuan

AU - Wang, Chenyu

AU - Tan, Li

AU - Yang, Guangwen

AU - Thomson, John

N1 - Funding: J.Thomson and T.Yu are supported by the EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R010528/1, and EU Horizon 2020 grant Team-Play: ”Time, Energy and security Analysis for Multi/Many-core heterogenous PLAtforms” (ICT-779882, https://teamplay- h2020.eu)

PY - 2018/11/11

Y1 - 2018/11/11

N2 - This paper presents a novel design and implementation of k-means clustering algorithm targeting the Sunway TaihuLight supercomputer. We introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level (nkd) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer. Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation achieves performance of less than 18 seconds per iteration for a largescale clustering case with 196,608 data dimensions and 2,000 centroids by applying 4,096 nodes (1,064,496 cores) in parallel, making k-means a more feasible solution for complex scenarios.

AB - This paper presents a novel design and implementation of k-means clustering algorithm targeting the Sunway TaihuLight supercomputer. We introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level (nkd) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer. Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation achieves performance of less than 18 seconds per iteration for a largescale clustering case with 196,608 data dimensions and 2,000 centroids by applying 4,096 nodes (1,064,496 cores) in parallel, making k-means a more feasible solution for complex scenarios.

KW - Supercomputer

KW - Multi/many-core Processors

KW - Clustering

KW - Parallel computing

M3 - Conference contribution

BT - Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18)

PB - IEEE Press

CY - Piscataway

ER -

Related by author

  1. Lattice-based scheduling for multi-FPGA systems

    Yu, T., Feng, B., Stillwell, M., Guo, L., Ma, Y. & Thomson, J. D., 10 Dec 2018, Proceedings of the International Conference on Field-Programmable Technology 2018, Naha, Okinawa, Japan. IEEE Press

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  2. Predicting and optimizing image compression

    Murashko, O., Thomson, J. D. & Leather, H., 1 Oct 2016, Proceedings of the 24th ACM International Conference on Multimedia. ACM, p. 665-669

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  3. Milepost GCC: Machine Learning Enabled Self-tuning Compiler

    Fursin, G., Kashnikov, Y., Memon, A., Chamski, Z., Temam, O., Namolaru, M., Yom-Tov, E., Mendelson, B., Zaks, A., Courtois, E., Bodin, F., Barnard, P., Ashton, E., Bonilla, E., Thomson, J. D., Williams, C. & O'Boyle, M., 2011, In : International Journal of Parallel Programming. 39, 3, p. 296-327 32 p.

    Research output: Contribution to journalArticle

  4. Automatic OpenCL device characterization: guiding optimized kernel design

    Thoman, P., Kofler, K., Studt, H., Thomson, J. D. & Fahringer, T., 2011, Euro-Par 2011 Parallel Processing: 17th International Conference, Euro-Par 2011, Bordeaux, France, August 29 - September 2, 2011, Proceedings, Part II. Berlin, Heidelberg: Springer-Verlag, p. 438-452 15 p. (Lecture Notes in Computer Science; vol. 6853/2011).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  5. Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining

    Fenacci, D., Franke, B. & Thomson, J., 2010, Proceedings of the 13th International Workshop on Software 38; Compilers for Embedded Systems. New York, NY, USA: ACM, p. 5:1-5:10 (SCOPES '10).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

ID: 255501866