Development of a Large Word-Width High-speed Asynchronous Multiply and Accumulate Unit
This paper details the design of the fastest known asynchronous Multiply and Accumulate unit (MAC) architecture published to date. The MAC architecture herein is based on the MAC developed in Smith et al. (J. Syst. Archit. 47/12 (2002) 977-998). However, the MAC developed in Smith et al. (2002) contains conditional rounding, scaling, and saturation (CRSS) logic, not present in other comparable MACs (Twenty-Sixth Hawaii International Conference on System Sciences, vol. 1, 1993, pp. 379-388; Asian South-Pacific Design Automation Conference, 2000, pp. 15-16; Sixth IEEE International Conference on Proceedings of ICECS, vol. 2, 1999, pp. 629-633); thus making the comparison between the MAC developed in Smith et al. (2002) and other delay-insensitive/self-timed MACs in the literature not completely fair, in favor of the other MACs. This paper first details the removal of the CRSS logic from the MAC developed in Smith et al. (2002), and describes its subsequent optimal re-pipelining, in order to provide a more fair comparison. This yields a speedup of 1.12. Secondly, this paper details the application of the NULL Cycle Reduction technique (The 10th International Workshop on Logic and Synthesis, 2001, pp. 185-189; Gate and throughput optimizations for NULL convention self-timed digital circuits, Ph.D. Dissertation, School of Electrical Engineering and Computer Science, University of Central Florida, 2001) to the MAC's feedback loop, and subsequent re-pipelining of the feed-forward partial product generation and summation circuitry to further increase throughput, resulting in an additional speedup of 1.31 (a speedup of 1.46 over the MAC from Smith et al. (2002). Lastly, the bit-wise completion strategy is utilized in lieu of full-word completion to decrease the area required by 6% and also increase the MAC's throughput an additional 1%.
S. C. Smith, "Development of a Large Word-Width High-speed Asynchronous Multiply and Accumulate Unit," Integration, the VLSI Journal, Elsevier, Jan 2005.
The definitive version is available at http://dx.doi.org/10.1016/j.vlsi.2004.11.001
Electrical and Computer Engineering
Keywords and Phrases
NULL Convention Logic; NULL Cycle Reduction; Gate-Level Pipelining; Computer arithmetic
Article - Journal
© 2005 Elsevier, All rights reserved.