A high-performance accelerator for floating-point matrix multiplication

Xun Jia, Guiming Wu, Xianghui Xie

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.

Original languageEnglish (US)
Title of host publicationProceedings - 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages396-402
Number of pages7
ISBN (Electronic)9781538637906
DOIs
StatePublished - May 25 2018
Event15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017 - Guangzhou, China
Duration: Dec 12 2017Dec 15 2017

Other

Other15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017
CountryChina
CityGuangzhou
Period12/12/1712/15/17

Fingerprint

Matrix multiplication
Floating point
Accelerator
Particle accelerators
High Performance
Computing
GPGPU
Design Space Exploration
Program processors
Field programmable gate arrays (FPGA)
Performance Model
Engineering Application
Scheduling
Field Programmable Gate Array
Data storage equipment
Processing
Requirements
Experimental Results

Keywords

  • Accelerator
  • Architecture
  • High-performance
  • Linear array
  • Matrix multiplication

ASJC Scopus subject areas

  • Computer Science Applications
  • Hardware and Architecture
  • Information Systems
  • Control and Optimization
  • Computer Networks and Communications

Cite this

Jia, X., Wu, G., & Xie, X. (2018). A high-performance accelerator for floating-point matrix multiplication. In Proceedings - 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017 (pp. 396-402). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISPA/IUCC.2017.00063

A high-performance accelerator for floating-point matrix multiplication. / Jia, Xun; Wu, Guiming; Xie, Xianghui.

Proceedings - 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017. Institute of Electrical and Electronics Engineers Inc., 2018. p. 396-402.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Jia, X, Wu, G & Xie, X 2018, A high-performance accelerator for floating-point matrix multiplication. in Proceedings - 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017. Institute of Electrical and Electronics Engineers Inc., pp. 396-402, 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017, Guangzhou, China, 12/12/17. https://doi.org/10.1109/ISPA/IUCC.2017.00063
Jia X, Wu G, Xie X. A high-performance accelerator for floating-point matrix multiplication. In Proceedings - 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017. Institute of Electrical and Electronics Engineers Inc. 2018. p. 396-402 https://doi.org/10.1109/ISPA/IUCC.2017.00063
Jia, Xun ; Wu, Guiming ; Xie, Xianghui. / A high-performance accelerator for floating-point matrix multiplication. Proceedings - 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 396-402
@inproceedings{ea6268486fd34be292883574bfe8545c,
title = "A high-performance accelerator for floating-point matrix multiplication",
abstract = "Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99{\%} for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.",
keywords = "Accelerator, Architecture, High-performance, Linear array, Matrix multiplication",
author = "Xun Jia and Guiming Wu and Xianghui Xie",
year = "2018",
month = "5",
day = "25",
doi = "10.1109/ISPA/IUCC.2017.00063",
language = "English (US)",
pages = "396--402",
booktitle = "Proceedings - 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - A high-performance accelerator for floating-point matrix multiplication

AU - Jia, Xun

AU - Wu, Guiming

AU - Xie, Xianghui

PY - 2018/5/25

Y1 - 2018/5/25

N2 - Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.

AB - Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.

KW - Accelerator

KW - Architecture

KW - High-performance

KW - Linear array

KW - Matrix multiplication

UR - http://www.scopus.com/inward/record.url?scp=85048371209&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048371209&partnerID=8YFLogxK

U2 - 10.1109/ISPA/IUCC.2017.00063

DO - 10.1109/ISPA/IUCC.2017.00063

M3 - Conference contribution

AN - SCOPUS:85048371209

SP - 396

EP - 402

BT - Proceedings - 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -