Volume 14, Issue 4 471289 pp. 337-347
Article
Open Access

Fast Inner Product Computation on Short Buses

R. Lin

R. Lin

Department of Computer Science SUNY at Geneseo Geneseo, NY 14454, USA , geneseo.edu

Search for more papers by this author
S. Olariu

Corresponding Author

S. Olariu

Department of Computer Science Old Dominion University Norfolk, VA 23529, USA , odu.edu

Search for more papers by this author
First published: 12 April 2001
Citations: 1

Abstract

We propose a VLSI inner product processor architecture involving broadcasting only over short buses (containing less than 64 switches). The architecture leads to an efficient algorithm for the inner product computation. Specifically, it takes 13 broadcasts, each over less than 64 switches, plus 2 carry-save additions (tcsa) and 2 carry-lookahead additions (tcla) to compute the inner product of two arrays of N = 29 elements, each consisting of m = 64 bits. Using the same order of VLSI area, our algorithm runs faster than the best known fast inner product algorithm of Smith and Torng [“Design of a fast inner product processor,” Proceedings of IEEE 7th Symposium on Computer Arithmetic (1985)], which takes about 28 tcsa + tcla for the computation.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.