A Survey on Architectural Evolution, Cross-Stack Optimization, and System-Level Challenges in Accelerating Convolutional Neural Networks with FPGAs

Bingrui Wang

Authors

Bingrui Wang Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo, Zhejiang Province, China

Keywords:

FPGA based CNN acceleration, A cross-stack optimization and dynamic reconfiguration survey of architectural evolution and tool-chain automation

Abstract

A study about CNN accelerating on FPGA from an all-angle sight through analyzing the optimization situations across the architecture boundaries and analyzing critical systematic issues for fast training. Impelled by CNN's strong computing ability for computer vision and GPU's poor energy performance, FPGA-based CNN accelerators based on reconfigurability with power-of-2 grid now emerge as reasonable solutions. After detailing the last five years’ accomplishment with systematized collection of the top advances over the FPGA-CNN acceleration works, the method is deployed to analyze advances according to four dimensions: computational acceleration (CA), memory/bandwidth efficiency improvement (MBE), model compression (MC), and tool-chain automation (TA). Further, through the introduction of our new 3D evaluation criteria—effectiveness, generality, and evolutionary continuance—the comparison of important accelerators (for example, Winograd vs. GEMM methods, structured pruning vs. quantization) is performed for both throughput, accuracy loss, and resource usage. Meanwhile, we pinpoint several more important but relatively less explored subjects that may help clarify the reason why suitable CNN accelerators often do not fit directly into critical design-time factors (e.g., Algorithm Hardware Gap and Dynamic Adaptiveness), particularly before facing inherent issues of immature tool chains for supporting deep learning models such as Vision Transformers. Moreover, towards guiding the development of algorithms-friendly FPGA architectures for future co-design, i.e., Algorithmic-aware FGPA designs, two new theories concerning the decoupling of algorithms from hardware and algorithmic adaptations at or close to the design breakpoints are proposed as recommendations for inspiration. Furthermore, for the purpose of facilitating subsequent study within the field, we also deliver our own repository for state- of-the-art FPGA-CNN, which serves as a structure database to preserve and archive test-case results; together with an accessible interaction-evolution map visualizing current trends, all of which is free and available for browsing. The goal of this paper is to serve not only as groundwork of continuing study for others but also pointing out possible directions for the heterogeneous FPGA-RISC-V co-designs, the brain-like computations and so on.

A Survey on Architectural Evolution, Cross-Stack Optimization, and System-Level Challenges in Accelerating Convolutional Neural Networks with FPGAs

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

License

Journal Information