Efficient Nonrecursive Bit-Parallel Karatsuba Multiplier for a Special Class of Trinomials
Abstract
Recently, we present a novel Mastrovito form of nonrecursive Karatsuba multiplier for all trinomials. Specifically, we found that related Mastrovito matrix is very simple for equally spaced trinomial (EST) combined with classic Karatsuba algorithm (KA), which leads to a highly efficient Karatsuba multiplier. In this paper, we consider a new special class of irreducible trinomial, namely, xm + xm/3 + 1. Based on a three-term KA and shifted polynomial basis (SPB), a novel bit-parallel multiplier is derived with better space and time complexity. As a main contribution, the proposed multiplier costs about 2/3 circuit gates of the fastest multipliers, while its time delay matches our former result. To the best of our knowledge, this is the first time that the space complexity bound is reached without increasing the gate delay.
1. Introduction
Efficient hardware implementation of the finite field arithmetic, especially for GF(2m), is frequently desired in coding theory and public-key cryptosystems [1, 2]. Among these arithmetic operations in GF(2m), multiplication is of the most importance, as other complicated field operations such as exponentiation and inversion can be carried out by iterative multiplications. Thus, it is necessary to design efficient multiplier.
The field elements are usually represented by a certain basis such as polynomial basis (PB), normal basis (NB), and dual basis (DB). In PB representation, the multiplication consists of multiplying two polynomials and reducing the result modulo an irreducible polynomial. The choice of such an irreducible polynomial is critical to perform the reduction operation efficiently. Irreducible trinomial is one of the most common considerations [3, 4]. During recent years, many bit-parallel multipliers using PB representation are proposed for GF(2m) defined by irreducible trinomials, some of which can be found in [3, 5–8]. The efficiency of the architecture is always evaluated by space and time complexity. The former one is expressed in terms of the number of logic gates (XOR and AND) and the latter one is expressed in terms of the sum of XOR and AND gates delay of the critical path. Among these multipliers, the fastest bit-parallel multipliers nowadays are proposed by Fan and Hasan [9] and Hariri and Reyhani-Masoleh [10]. If GF(2m) is defined by f(x) = xm + xk + 1, 1 < k ≤ m/2, the corresponding multiplier requires m2 AND and m2 − 1 XOR gates with time delay TA + ⌈log2(2m − k)⌉TX (for good fields, the time delay is TA + ⌈log2m⌉TX), where TA and TX are the circuit delay of one AND gate and one XOR gate, respectively. Except for these multipliers for general trinomials, there are also several proposals for special types of irreducible trinomials [11–13]. These multipliers usually utilize the special form of the trinomial to obtain efficient implementation.
The Karatsuba algorithm (KA) works recursively by breaking down one big multiplication into two or more submultiplications. It is a typical divide-and-conquer algorithm. Please note that the classic KA starts with a way to multiply two 2-term polynomials using three scalar multiplications. Some other variations are also investigated. More details can be found in [14–16]. The KA can be adopted to design subquadratic complexity multiplier [14, 17] or hybrid multiplier [18, 19]. Specially, there is another type of hybrid multiplier, namely, nonrecursive Karatsuba multiplier, which only applies KA once in the polynomial multiplication [8, 20]. These multipliers regularly require 3/4 circuits gates compared to the fastest bit-parallel multipliers, while its time delay increased by a small number of TX. For example, Elia et al. [8] costs at least two more TX.
Recently, we proposed a novel nonrecursive Karatsuba multiplier that is based on Mastrovito approach [21]. It is shown that our multiplier only requires one more TX compared with the fastest multipliers [9, 10]. However, it costs a few more logic gates than Elia′s result. Except for the nonrecursive Karatsuba multiplier for general trinomials, Shen and Jin [13] proposed a new Karatsuba multiplier that fully exploited equally spaced trinomial and the classic KA to simplify the modular reduction. Consequently, the space complexity of their scheme matches Elia′s result. Meanwhile, the time complexity is TA + (1 + ⌈log2(m − 1)⌉)TX, which is roughly equal to the fastest results. Furthermore, we observe that the special case m = 2k of our multiplier coincides with their scheme. (Here, the trinomial xm + xk + 1 (m = 2k) is an equally spaced trinomial.)
In this paper, we explore another special case of our former scheme to obtain even more efficient nonrecursive Karatsuba multipliers. Our main idea is analogous to Shen and Jin [13], where a special type of trinomials and a KA variation are utilized to simplify the structure of corresponding Mastrovito matrix. More explicitly, we consider the irreducible trinomial xm + xm/3 + 1 and a three-term Karatsuba algorithm. It is demonstrated that the corresponding Mastrovito matrix can be simplified further under this condition. The shifted polynomial basis (SPB) [4] is also utilized to reduce the critical path delay further. Consequently, we proposed a bit-parallel multiplier that costs approximately 2/3 circuit gates of the fastest bit-parallel multipliers. On the other hand, the time complexity is TA + ⌈log2(8m/3)⌉TX, which almost matches the best known results.
The rest of this paper is organized as follows: In Section 2, we briefly review the Mastrovito approach based on SPB representation and some relevant notions. Then we introduce a three-term KA formula and investigate the structure of related Mastrovito matrix. A new bit-parallel multiplier architecture is then proposed in Section 3. Section 4 presents a comparison between the proposed multiplier and some others. Finally, some conclusions are drawn.
2. Preliminary
In this section, we briefly review some related notations and algorithms used throughout this paper. Consider the finite field GF(2m) generated with an irreducible trinomial xm + xm/3 + 1. Let x be a root of xm + xm/3 + 1 and the set M = {xm−1, xm−2, …, x, 1} constitute a polynomial basis (PB). Therefore, every element of GF(2m) can be represented as a polynomial over of degree less than m. The shifted polynomial basis (SPB) is a variation of the polynomial basis, which is obtained by multiplying the set M by certain exponentiation of x.
Definition 1 (see [4].)Let v be an integer and the ordered set M = {xm−1, …, x, 1} be a polynomial basis of GF(2m) over . The ordered set x−vM≔{xi−v∣0 ≤ i ≤ m − 1} is called the shifted polynomial basis with respect to M.
Moreover, one can check that the irreducible trinomial in the form of xm + xm/3 + 1 exists when m = 3 × 7i where i is a nonnegative integer [1]. Although the number of this type of irreducible trinomials is not that abundant, there still exist some trinomials in the range of interest for practical application.
- (i)
Z(i, :) represents the ith row vector in matrix Z;
- (ii)
Z(:, j) represents the jth column vector in matrix Z;
- (iii)
Z(i, j) represents the entry with position (i, j) in matrix Z.
3. Mastrovito Multiplier Using a Three-Term Karatsuba Algorithm
In general, the Mastrovito multiplication utilizing the KA will increase the time complexity. Our former result shows that a Mastrovito multiplier using classic KA costs one more TX than the fastest ones. However, some literature sources [13] indicated that this result would be further improved for some special cases, for example, the EST xm + xm/2 + 1. In the following, we will show that for the trinomial xm + xm/3 + 1, applying the three-term Karatsuba-like formula will also simplify the reduction operation and lead to fast implementation.
3.1. Computation of S1modf(x)
Proposition 2. The Mastrovito matrix MA can be constructed as
Proof. The proof is analogous with the proof of observation 3.1 in [21]. Note that the product matrix A contains 2m − 1 nonzero rows (the last row A(2m, :) is a zero vector), each of which corresponds to the polynomial degree from −2k to 2m − 2k − 2. It is easy to check that the first k rows and the last m − k − 1 rows correspond to the degrees that are out of the range [−k, m − k − 1]. Thus, we need to reduce these rows.
According to the reduction rule in (6), we have to reduce {−2k, −2k + 1, …, −k − 1} by adding them to the row {−k, …, −1} and {m − 2k, …, m − k − 1} and reduce the rows {m − k, …, 2m − 2k − 2} by adding them to the row {0, …, m − k − 2} and {−k, …, m − 2k − 2}. Obviously, the first k row here is [A0,L, 0k×k, 0k×k] and the last m − k − 1 rows constitute
- (i)
Perform row-vector products:
(18) -
in parallel. The symbol “∗” represents only row-vector product related to Ai,L (or Ai,H) and bi, i = 0,1, 2. For example, A0,H∗b0 represents computing the inner product [A0,H(i, 1) · b0, …, A0,H(i, k) · bk−1], for i = 1,2, …, k in parallel.
- (ii)
Sum up all the 2m entries of each row using binary XOR tree. Specially, consider some products of each row are zero; we compute the following summations:
(19) -
using binary XOR tree firstly and then add these results together.
Remarks 3. It is easy to see that the row-vector products (18) contain all the possible row-vector products in (17). In addition, A0,H, A1,L, A1,H, and A2,L are all triangular matrices; one can easily check that each row of both [A0,H, A1,L] and [A1,H, A2,L] consists of at most k nonzero entries. After the computation of (18) and (19), certain number of XOR gates is required to obtain the final result. Table 1 summarizes the space and time complexity of S1 for all the steps.
3.2. Computation of S2modf(x)
4. Theoretic Comparison
Table 3 gives a comparison of different implementation methods of bit-parallel multipliers in the fields generated by trinomials xm + xm/3 + 1. From Table 3, we can see that our multiplier requires about 2/3 circuit gates compared with the previous architectures without using divide-and-conquer algorithm. On the other hand, the time complexity of the proposed multiplier is TA + ⌈log2(8m/3)⌉TX, which is very close to the fastest result. In fact, we have checked this type of trinomials with degree m = 3 · 7i, i = 1,2, …, 1000, and found that there are 585 such trinomials reaching the bound TA + (1 + ⌈log2m⌉TX) (others require only one more TX).
Multiplier | #AND | #XOR | Time delay |
---|---|---|---|
Sunar and Koç [3] | m2 | m2 − 1 | TA + (2 + ⌈log2m⌉)TX |
Wu [5] | m2 | m2 − 1 | TA + (2 + ⌈log2m⌉)TX |
Wu [6] | m2 | m2 − 1 | TA + (2 + ⌈log2m⌉)TX |
Fan and Dai [4] | m2 | m2 − 1 |
|
Elia et al. [8] | TA + (3 + ⌈log2m⌉)TX | ||
Negre [7] | m2 |
|
|
Fan [22] Type-A |
|
|
|
Fan [22] Type-B |
|
|
TA + ⌈log2(2m − 1)⌉TX |
Li et al. [21] |
|
|
|
This paper |
|
|
|
- Note. 2v−1 < m/3 ≤ 2v and W(∗) is the hamming weight of the number.
In Table 4, we give a small example of field GF(2147) defined by x147 + x49 + 1. It shows that, compared with other approaches, our architecture may be the best choice if the space and time complexity are both considered. In addition, compared with the fastest Karatsuba multiplier for general trinomials [21], it is argued that the space and time complexities can be reduced even further if special KA and irreducible polynomial are combined together.
Basis | #AND | #XOR | Time |
---|---|---|---|
PB [3, 5] | 21609 | 21608 | TA + 10TX |
PB [8] | 16280 | 16838 | TA + 11TX |
SPB [4] | 21609 | 21608 | TA + 9TX |
SPB [9] | 21609 | 21608 | TA + 8TX |
SPB [7] | 21609 | 27391 | TA + 8TX |
PB-CRT Type A [22] | 21560 | 21560 | TA + 9TX |
PB-CRT Type B [22] | 21560 | 21658 | TA + 9TX |
SPB [21] | 16280 | 17394 | TA + 9TX |
SPB (this paper) | 14406 | 14748 | TA + 9TX |
5. Conclusion
In this paper, a new Mastrovito multiplier architecture for trinomial of the form xm + xm/3 + 1 is proposed. We show that the space and time complexity of our former Mastrovito-Karatsuba multiplier can be further reduced for special form of trinomial combined with a KA variation. This multiplier can be used in some area-critical occasions because it has low space complexity but maintains a relatively low time delay. To find more polynomials which can use the proposed strategy will be the future work.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work is supported by the Natural Science Foundation of China (nos. 61402393, 61601396) and Shanghai Key Laboratory of Integrated Administration Technologies for Information Security (no. AGK201607).