A Physical-Aware Clustering Approach for Fused Dot-Product Implementation
Optimized hardware for the execution of large dot product calculations is central to many of today’s integrated
circuits. These arithmetic blocks are often implemented with the parallel fused dot-product approach, and to achieve high performance, are realized with a tree-based compression algorithm, using on commercially available synthesis macros. However, these macros are based on performance optimization of the gate level netlist and fail to take into account the consequences of the applied heuristics on the physical-implementation (layout)of these large circuits. We propose a physical-aware approach to fused dot-product implementation based on the affinity between the logic gates that make up the gate-level structure. The proposed clustered dot-product (CDP) algorithm, enables the place and route tools to cluster gates with high affinity, leading to higher placement utilization and lower routing congestion. Dot-product calculations with up to 78 multipliers were implemented with a 65nm CMOS standard cell library, providing power reduction of up to 63%, up to 60% lower area, and performance improvements as high as 2.5X, as compared to similar implementations based on commercial macros based on post-layout results.
* M.Sc. research supervised by Dr. Adam Teman