Abstract
Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 396-402 |
Number of pages | 7 |
ISBN (Electronic) | 9781538637906 |
DOIs | |
State | Published - May 25 2018 |
Event | 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017 - Guangzhou, China Duration: Dec 12 2017 → Dec 15 2017 |
Other
Other | 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017 |
---|---|
Country | China |
City | Guangzhou |
Period | 12/12/17 → 12/15/17 |
Keywords
- Accelerator
- Architecture
- High-performance
- Linear array
- Matrix multiplication
ASJC Scopus subject areas
- Computer Science Applications
- Hardware and Architecture
- Information Systems
- Control and Optimization
- Computer Networks and Communications