Skip to content

Research at St Andrews

Large-scale hierarchical k-means for heterogeneous many-core supercomputers

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Author(s)

Lideng Li, Teng Yu, Wenlai Zhao, Haohuan Fu, Chenyu Wang, Li Tan, Guangwen Yang, John Thomson

School/Research organisations

Abstract

This paper presents a novel design and implementation of k-means clustering algorithm targeting the Sunway TaihuLight supercomputer. We introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level (nkd) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer.

Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation achieves performance of less than 18 seconds per iteration for a large-scale clustering case with 196,608 data dimensions and 2,000 centroids by applying 4,096 nodes (1,064,496 cores) in parallel, making k-means a more feasible solution for complex scenarios.
Close

Details

Original languageEnglish
Title of host publicationProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18)
Place of PublicationPiscataway
PublisherIEEE Press
Chapter13
Number of pages11
ISBN (Electronic)9781538683842
DOIs
Publication statusPublished - 11 Nov 2018
EventThe International Conference for High Performance Computing, Networking, Storage, and Analysis - Dallas, United States
Duration: 11 Nov 201816 Nov 2018
https://sc18.supercomputing.org/

Conference

ConferenceThe International Conference for High Performance Computing, Networking, Storage, and Analysis
Abbreviated titleSC18
CountryUnited States
CityDallas
Period11/11/1816/11/18
Internet address

    Research areas

  • Supercomputer, Multi/many-core Processors, Clustering, Parallel computing

Discover related content
Find related publications, people, projects and more using interactive charts.

View graph of relations

ID: 255501866

Top