

http://dx.doi.org/10.12785/ijcds/130151

# Design and Analysis of Efficient Vedic Multiplier for Fast Computing Applications

Aishita Verma<sup>1</sup>, Anum Khan<sup>2</sup> and Subodh Wairya<sup>3</sup>

<sup>1,2,3</sup>Electronics and Communication Engineering Department, Institute of Engineering and Technology, Lucknow, UP, India.

Received 22 Nov. 2021, Revised 20 Dec. 2022, Accepted 6 Feb. 2023, Published 16 Apr. 2023

Abstract: The significant part of every digital signal processing (DSP) application is a multiplier. This work presents the highperformance 4x4 and 8x8 Vedic multiplier designed utilizing scalable adder and compressor architectures. Several 8-bit adder designs, namely CPL, GDI 1, and Scalable full adders, are implemented to establish the superiority of the Scalable adder, which is used for compressor implementation. The proposed 4x4 Vedic Multiplier architecture, comprises half adder, a chain of 3-2 compressors, and a chain of 4-2 compressors. The design metrics are compared to existing Binary, Braun, and Array multipliers, as well as five standard Vedic multiplier designs. This 4x4 multiplier is compared under similar conditions with eight other multiplier topologies, and the proposed 4x4 Vedic multiplier provides the best performance considering the power, delay, PDP and EDP of the circuits. The proposed 4x4 Vedic multiplier is extended to an 8x8 Vedic multiplier design and its performance analysis is done. All circuits are implemented using Cadence Virtuoso simulation software at 45-nm technology. Monte Carlo simulations and analysis of the process corners of the proposed design are also done. The overall results of the proposed design show that it is speed-efficient and consumes less power, thus making it relevant for fast computing applications such as image processing.

Keywords: Scalable adder, Vedic multiplier, Compressors, Monte Carlo.

# 1. INTRODUCTION

The main concern of designers in context of digital circuits is to design the circuit which consumes less power thereby providing faster operation. For improving the efficiency of arithmetic units, Substantial research has been carried out.In the DSP applications Adders and Multipliers are performance regulating. The adder cell is the basic element in multiplier design, which overall affects the performance of multiplier. Proper architectures of multipliers are necessary for achieving higher accuracy levels. The summation of partial product in multiplication contributes largely in power consumption and delay. Several methods are discussed in the research for reducing the delay and power dissipation in summation stage of partial product. The delay can be reduced with usage of compressors in summation of partial product. The basic circuits designed with 1 bit adders/ half adders are Compressors for counting the number of \ones" in input. Various designs of compressors including 3-2, 4-2, 5-2 and 7-3 have been designed and discussed in the literature. While cascading regularity is achieved with 4: 2 compressor so, these are generally preferred and are been used in multiplier design. Structuring of this paper is mentioned below:

i.A 8-bit full adder is selected form different designs of adder namely CPL,GDI 1 and Scalable full adder as presented by the references -[1], [2], [3].

ii.The Compressors 4-2, 5-2 [4] are designed with full adder and 3-1-1-2 [5] are designed with AND, XOR and MUX module of Scalable Full adder.

iii.Different architectures of multipliers discussed in the literature as presented by the references [6], [7], [8], [9], [10], [5], [11], [12] have been designed and simulated and finally a 4X4 Vedic multiplier design is proposed.

iv. The Comparison is done among different architectures of multipliers and finally 8-bit multiplier is designed.

# A. Adder Topologies

The full adder is basic unit in design of multiplier. The design methodologies of full adders are multifarious. Speed, Average Power and PDP are foremost parameters while determining the performance of full adders. This section describes the full adders designs that have been taken in consideration.

E-mail address: aishita.verma19@gmail.com, anumkhan0902@gmail.com, swairya@gmail.com





Figure 1. CPL adder

## 1) Complementary Pass Transistor Logic (CPL)adder

In the design of this adder complemented outputs are generated in same design. The design of CPL adder incorporate only NMOS pass transistors followed by CMOS inverter at output as presented in [1].Fig.1 depicts the schematic of CPL adder. The CPL adder has high speed with use of positive feedback in its design with full swing output. This adder is a dual rail design and necessitates the use of 32 transistors in its design. The circuit provides good driving capability using PMOS pull up transistors to restore the swing. With the inclusion of static inverters and many internal nodes there is large power consumption with wiring complexity.

## 2) Gate Diffusion Input (GDI) adder

The GDI logic uses fewer transistors in its design and is a superior alternative to the conventional CMOS logic. This topology consumes less power since it has a reduced transistor count. In order to achieve full swing in the outputs, the proper design of modules like AND and XOR is necessary. The XOR, AND, and OR gates are designed in the GDI full adder reported in [3], and the Sum and Cout results are computed using the XOR gate output. An inverter is also present in the design, which increases the delay in the critical path, so overall circuit delay is increased. Fig. 2. depicts the GDI FA schematic, which necessitates 18 transistors. The full swing is observed in the inputs A, B, and Cin but decreased voltage swing is at the output which means that output voltage low or high deviates from supply voltage or GND. This leads to less power, but in cascade, slow transitions are observed, so scalability is a significant challenge in GDI FA.



Figure 2. GDI Full Adder [3]



Figure 3. Scalable Full adder[4]

## 3) Scalable Full adder

The scalable full adder design presented in [4] is superior to other adder topologies. This scalable full adder employs a hybrid design that includes Pass Transistors (PT), Transmission Gates (TG), and conventional CMOS (CC-MOS) logic. The AND-OR module, TG-based Multiplexer (MUX), and XOR gate module are among the design's key implementation modules. The circuit is simple, with only 22 transistors. This design allows for full input and output swing. The design has been scaled up to 64 bits for scalability, as the name suggests. Cascading up this adder to 64 bits, full swing in the output is seen as well. While buffers are not required when cascading, they are required in the CPL logic discussed above to restore the signal to its proper voltage supply level. The GDI FA discussed above has a weak output signal while cascading. The Scalable full adder provides high performance both as a 1-bit FA and also while extending up to 64-bits. While providing full swing in the outputs, this full adder consumes less power. This full adder design uses 22 transistors, but the predominant feature of this design is scalability. Therefore, the architectures of multipliers like Binary, Braun , Array multipliers, and Vedic multipliers are designed using a scalable adder. The schematic is depicted in Fig.3.

## B. Adder applications

To design an efficient Vedic multiplier, there is a need for an efficient n-bit adder topology as well as efficient





Figure 4. n bit RCA

compressors. Thus, this section discusses relevant RCA and compressor architectures.

#### 1) Ripple Carry Adder(RCA)

Fig.4 depicts the schematic of the multibit Ripple Carry adder (RCA), it is the simplest possible topology for a multibit adder. It is essentially a series of 1-bit full adders, where the initial carry propagates through each stage of the circuit. In this circuit, there are two n-bit inputs, A and B, denoted as A0-An-1 and B0-Bn-1 which have been applied to the architecture along with the initial input carry, denoted as Cin to obtain the n bit Sum outputs denoted as S0-Sn-1, and similarly, output carry at each stage is denoted as Cout0-Coutn. The output carry bit of each stage propagates to the next 1-bit full adder in a ripple-like manner, hence the name RCA.

#### 2) Compressor Architecture

In the high-speed multipliers, the compressors are utilized for the reduction of delay that occurs in the product summation part [13]. Compressors are utilized for performance enhancement in terms of the multiplier's speed. The most basic compressor is a 3:2compressor. These are essentially simple full adders as per their functionality, inputs, and outputs [14], [15]. Here the scalable FA is used as a 3-2 compressor. The higher-order compressors like 4-2, 5-2, etc. are generally preferred in multiplier architecture for better performance. These higher-order compressors are usually constructed using 3-2 compressors as their basic unit. In this paper, 4-2, 5-2, and 3-1-1-2 compressors are implemented and selected from designs of compressors as proposed by the references [16], [17] and [11] using the Scalable FA. Fig 5. (a) depicts the schematic of 4-2 Compressor, 5-2 in fig. 5(b), and 3-1-1-2 Compressor in Fig.5(c).

The 4-2 Compressor is constructed with a full adder; In the designing of the 4-2 Compressor, two counts of 1 bit full adders are required. The inputs to the first full adder (FA) are y1, y2, and y3, and the output bits are Cout and Sum, the sum obtained from the first full adder is input to the subsequent FA along with additional inputs of y4 and Cin. The outputs of a 1 bit FA are the sum and carry. In this manner, there are three outputs obtained as depicted in equations (1)-(3).

$$Sum = y1 \oplus y2 \oplus y3 \oplus y4 \oplus Cin \tag{1}$$



Figure 5. Compressor Topologies (a)4-2 Compressor (b)5-2 Compressor (c) 3-1-1-2 Compressor

$$Cout = (y1 \oplus y2) \bullet y3 + \overline{(y1 \oplus y2)} \bullet y1$$
(2)

$$Carry = (y1 \oplus y2 \oplus y3 \oplus y4) \bullet Cin + \overline{(y1 \oplus y2 \oplus y3 \oplus y4)} \bullet y4$$
(3)

Similarly, the 5-2 compressor is made up of three full adders [18]. The inputs to the first FA are y1, y2, and y3, and the output bits are Cout1 and S1, The sum obtained from the first FA is input to the second FA with additional inputs y4, y5, and Cin1. The outputs of this full adder are Cout2 and Sum2.In the third full adder, the inputs are Sum2, Cin2 and y5. In this manner, the outputs Cout1, Cout2, Carry, and sum are obtained as shown in equations (4)-(10).

$$Cin2 = Sum + 2(Carry + Cout1 + Cout2)$$
(4)

$$S1 = y1 \oplus y2 \oplus y3 \tag{5}$$

$$S2 = S1 \oplus y4 \oplus Cin1 \tag{6}$$

$$Sum = S2 \oplus y5 \oplus Cin2 \tag{7}$$

$$Cout1 = (y1 \oplus y2) \bullet y3 + (y1 \oplus y2) \bullet y1 \tag{8}$$

$$Cout2 = (S1 \oplus y4) \bullet Cin1 + \overline{(y1 \oplus y2 \oplus y3 \oplus y4)} \bullet y4 \quad (9)$$

 $Carry = (S \ 1 \oplus y 4 \oplus y 5 \oplus Cin 1) \bullet Cin 2 + (y 1 \oplus y 2 \oplus y 3 \oplus y 4 \oplus Cin 1) \bullet y 5$ (10)

The 3-1-1-2 Compressor is designed with AND, OR, XOR, and MUX modules of full adder. The design consists of two 4:1 MUX, the two inputs to the first 4:1 MUX are the output from the XOR gate with inputs y1 and y2, and the other two inputs are output from the NOT gate. The two select lines are y3 and y4 which are the same for both 4:1 MUX. In the other 4:1 MUX, the inputs are applied by obtaining the output from the AND, OR, and NOT gates for obtaining the Carry output. The three input AND gate is used for Cout output. The relevant equations are shown in equations (11)-(13).

$$S um = ((y1 \oplus y2)\overline{y3.y4}) + (\overline{y1} \oplus y2.\overline{y3.y4}) + ((y1 \oplus y2).y3.\overline{y4}) + (\overline{(y1} \oplus y2).y3.y4}) + (\overline{(y1} \oplus y2).y3.y4) + (\overline{(y1} \oplus y2).y3.y4}) + (\overline{(y1} \oplus y2).y3.y4) + (\overline{(y1} \oplus y2).y4) + (\overline{(y1} \oplus y2).$$

The following sections comprise the paper: Section 1 introduces adder topologies and their applications required in





Figure 6. 4-bit Array Multiplier [10]

the construction of a multiplier. An overview of efficient 4bit multiplier topologies is presented in the Section 2. The proposed architecture of the 4x4 and 8x8 Vedic multipliers is elaborated in Section 3. The results obtained and their analysis are included in Section 4, and Section 5 concludes the paper.

## 2. Multiplier topologies

This section presents the design of different multipliers considered in this research [7], [8], [9]. In addition to this, the Vedic multiplier topologies cited in references [10], [13], [11], [12], [18] have also been discussed. Multiplication is performed using sixteen sutras in Vedic Mathematics. Urdhwa and Triyakbhyam (Urdhwa and Triyakbhyam) are Sanskrit words that signify vertically and crosswise that speed up calculation by performing addition and producing partial products [6]. These multiplier architectures are designed with full-swing hybrid scalable adders.

## A. Array Multiplier

This multiplier has a regular structure with a simple design. It is used for multiplying the unsigned numbers with the use of full adders, which are in the structure horizontally and vertically for acquiring the sum of the partial products as depicted in fig. 6.

If the first row of partial products acquired is designed with full adders, Cin is '0'. The delay is the time associated with the signals propagating through AND gates, half and full adders. These are larger, hence there is more power



Figure 7. 4-bit Braun Multiplier

and delay [19]. With an increment in operands, the array increases in size by the square of the size of operands.

## B. Braun Multiplier

The Braun multiplier is utilized for evaluating all products in the parallel method [8]. This method also saves power. The basic appearance of the multiplier contains the adders, which are parallel stages n-1 of the adders in Figure 7. Every row of adders sums the partial product, creating a partial sum. It is essentially a parallel array multiplier. The design of this multiplier comprises full adders and a series of AND gates in the design. It is a parallel multiplier but it is structurally complex , on comparison with the Array Multiplier.

## C. Binary Multiplier

Binary multipliers are required for the operation of devices such as mobile phones, calculators, computers, and processors. This multiplier is simpler as compared to the above-mentioned design. The multiplication method necessitates computing a set of partial products and then adding them together with 4-bit adders [9]. The mentioned method is the same as the longer multiplication process. The binary numbers are multiplied using the add and shift method. Fig. 8 depicts the basic architecture of a 4x4 Binary multiplier having 16 AND gates and three 4-bit adders. A3-A0 and B3-B0 are the 4-bit multiplicands used to generate an 8-bit product output denoted as P7-P0.

## D. Vedic Multipliers

Faster multiplication can be performed using sixteen sutras in Vedic Mathematics, which is an ancient Indian system of calculation. Two of the most popular algorithms are the Urdhwa sutra and the Triyakbhyam sutra. These are Sanskrit words that signify vertical and crosswise operations that speed up calculation by performing addition and producing partial products [14–18]. These are preferred mainly





Figure 8. 4-bit Binary Multiplier



Figure 9. 4X4 Vedic multiplier 1

due to their high speed. This method may be applied to multiplications of 2x2, 4x4, 8x8, and NxN bits. The goal of this sub section is to implement and analyze a highperforming 4x4 Vedic multiplier in order to develop the foundation for proposing a new architecture for an efficient Vedic multiplier. Several Vedic multiplier architectures are available in the literature, out of which the five most efficient 4x4 topologies are discussed in the following subsections.

## 1) Vedic multiplier 1

The 3-1-1-2, 4-2, 5-2 compressor-based 4x4 Vedic multiplier is presented in [10]. This topology is denoted as VM1 for further reference in the paper. In this structure, from the logical AND operation, partial product inputs are applied. The schematic of the 3-1-1-2, 4-2 and 5-2 compressors are depicted in Fig. 5. The operation of the 3-1-1-2 compressor is elaborated upon by the equations (1) to (3). The design of this multiplier is shown in Fig. 9.



Figure 10. 4X4 Vedic multiplier 2

## 2) Vedic multiplier 2

In this design of the 4x4 Vedic multiplier presented in [5], AND gates, full adder (FA) and half adder (HA), are used as shown in Fig.11. An efficient CMOS structure for FA and HA has been implemented to minimize the propagation delay. The major latency in the multiplier topology is caused due to the addition process of partial products, which is an integral process in multiplication. Therefore, here all the FA and HA units are properly structured so that the carry propagation delay is minimized, allowing increased addition speed units.

#### 3) Vedic multiplier 3

The 2x2 Vedic multiplier is the basic unit of the 4x4 Vedic multiplier. Its block diagram is depicted in Fig.11. A 2x2 multiplier is used to initiate the design. Its topology consists of four counts of 2 input AND gates and two HA units . To implement the 4x4 Vedic multiplier 3, four counts of 2x2 multipliers and three 4-bit RCAs are required. This design is simple and shows the full swing of the product terms.In this design of the 4x4 Vedic multiplier, and 4-bit RCA are used as depicted in the Fig.12.

## 4) Vedic multiplier 4

The architecture of the Vedic multiplier 4 is presented in [12]. To produce a single bit output, 16 AND gates are ordered from LSB to MSB. The product terms from S0 through S7 are computed with the usage of HA and FA as shown in Fig. 13. The first stage involves rearranging 16 AND gates with inputs from A3-A0 and B3-B0 to produce output for providing inputs to FA and HA. As a result, in the second stage and propagation stages of the multiplier , the full adders are reduced. But this design requires many half adders for designing, which increases the effective area of the multiplier.





Figure 11. 2X2 Vedic multiplier









Figure 14. 4X4 Vedic multiplier 5

# 5) Vedic multiplier 5

A modular topology of the 4-bit Vedic multiplier is presented in [18]. The architecture is arranged into  $\log 2n/2$ rows and 2n-1 columns as shown in fig.14. This architecture is modular, with each column comprising the same number of components of varying complexity. Every block is designed with two primary components, namely 4-2 compressors and 3-2 compressors, which have been reorganized in the proper order [20], [21], [22]. The partial products are summed together by grouping them and utilizing 4-2 compressors in a single column. In the second-row block, the resultant sum bits are assembled 4 to 4 and summed together. Carry generated is managed by 3-2 Compressors. The resultant carry is applied to the next stage so that it may be processed appropriately. The resultant carry is applied to the next stage so that it may be processed appropriately. This design of the Vedic multiplier is modular and extensible, but it is very complex, as shown in Fig. 14.

## 3. PROPOSED VEDIC MULTIPLIER

Vedic multiplication is primarily based on the cross multiplication of different sets of bits followed by their subsequent addition. An effort has been made in the previous section to determine the most effective topologies of 1-bit adders, which led to an analysis of RCA architectures. The most efficient 1-bit adder structure of the scalable adder is preferred as the base unit to implement the RCAs and the compressors of the proposed circuit. Adjoining the benefits of compressors with Vedic mathematics technique leads to improved computation speed while reducing the area.Figure 15 shows the block diagram depicting the approach to designing the proposed Vedic multiplier.

## A. Proposed 4x4 Vedic Multiplier

The proposed design utilizes 4-2 and 3-2 compressors in its design along with several AND gates for partial product generation. All the compressors in the proposed architectures are implemented using the hybrid scalable adder as depicted in Fig 5. The compressors are a good alternative to using several half adders and full adders







Figure 15. Block Diagram of Proposed 4x4 Vedic multiplier

together. As a result, at the partial product addition stage, critical path delay is reduced, resulting in less power. The HA employed in these designs is the high-performance CMOS implementation of the circuit using one XOR and AND gate. As described earlier, the partial products are generated using sixteen 2 input AND gates on the multiplicand and multiplier bits. The first output P0 is obtained directly from AND gate. In stage 1, generated partial products from AND gates are added with a chain of three 4-2 compressors and a single half adder .since only two inputs need to be computed, a full adder is not required. After that, each partial product is summed together, and the carry generated is managed by a chain of HA and four counts of 3-2 compressors. The resultant carry is applied to the next stage so that it may be processed appropriately. This topology of the Vedic multiplier is modular and extendable, and the attributed schematic is shown in fig 16. In the process of multiplication, there is an accumulation of partial products at every step in comparison with existing designs, hence the speed of the multiplier tends to increase. This topology is also extended to an 8-bit multiplier design.

## B. 8x8 Vedic Multiplier design

There are different efficient architectures for an 8bit multiplier available in the literature [5], [12], [23], [24], [25], [26]. The schematic of the proposed 8x8 Vedic multiplier is depicted in Fig. 17. The 4x4 multipliers and 8-bit RCA are generally required for the 8-bit multiplier design [26], [27], [19], [28]. In the proposed circuit, three units of 8-bit RCA are essential. These are implemented using the scalable adder. This structure is relatively simple



Figure 16. Schematic of Proposed 4x4 Vedic multiplier



Figure 17. 8-bit Vedic multiplier Schematic

as it eliminates the need for half adders or OR gates in the design.

The sixteen-bit product output (P15-P0) is obtained as per figure 17. From the different Vedic multiplier architectures present in the literature, the performance parameters of the proposed 8x8 Vedic Multiplier are compared against the highly efficient 8X8 Vedic multipliers presented in [5] and [12], these designs are denoted as 8x8 Vedic Multiplier Design 1 and 8x8 Vedic Multiplier Design 2 respectively. Design 1 is constructed using 3-1-1-2 compressors in its architecture whereas Design 2 comprises of Carry select adder. Both of these structures are implemented in 45nm technology of cadence virtuoso and simulated under the same conditions for a fair comparison.

## 4. SIMULATION RESULTS

To examine the performance, all the circuits are implemented in Cadence Virtuoso using 45-nm technology at 27°C. The performance parameters of all the implemented circuits are simulated under similar conditions, and the results obtained are subsequently compared to establish the best-performing circuit. Monte Carlo analysis and process



| VDD<br>(V) | Design      | Power<br>(µW) | Delay<br>(ps) | PDP<br>(aJ) | EDP<br>(10 <sup>-27</sup> Js) |
|------------|-------------|---------------|---------------|-------------|-------------------------------|
| 0.8        | CPL         | 0.784         | 1860          | 1458.2      | 2712.2                        |
| 0.8        | GDI 1       | 0.276         | 926.1         | 255.6       | 236.7                         |
| 0.8        | Scalable FA | 0.438         | 478.1         | 209.4       | 100.1                         |
| 1          | CPL         | 1.273         | 862           | 1097.3      | 945.8                         |
| 1          | GDI 1       | 0.421         | 797.5         | 335.7       | 267.7                         |
| 1          | Scalable FA | 0.552         | 220           | 121.4       | 26.7                          |
| 1.2        | CPL         | 2.199         | 765           | 1682.2      | 1286                          |
| 1.2        | GDI 1       | 1.452         | 489.5         | 710.7       | 347.9                         |
| 1.2        | Scalable FA | 0.784         | 144           | 112.8       | 16.25                         |

TABLE I. Performance parameters of 8 bit RCA



Figure 18. Comparative analysis of 8-bit RCA based on (a) No. of transistors (b) Power ( $\mu$ W) (c) Delay (ps) (d) PDP (aJ)

corner analysis of the best performing circuit are also done to validate the robustness of the circuit.

### A. RCA Analysis

The performance of 8-bit RCAs, namely CPL, GDI 1, and Scalable Adder, is determined by VDD values in the 0.8V-1.2V range. The PDP is computed by multiplying the delay with the average power of the circuit. Table 1 depicts simulation results of 8-bit full adders and fig.18 depicts the performance characteristics, namely delay, power, PDP, and EDP over the supply voltage variation from 0.8V-1.2V. Table 1.Performance parameters of 8 bit RCA

From the comparative analysis, it is obvious that the 8bit Scalable FA has less delay, power, PDP, and EDP as compared with the CPL and GDI 1 adders, as deduced by



Figure 19. The layout of the 8-bit RCA



Figure 20. The layout of the 3-2 Compressor

Table 1. Therefore, scalable FA is most suited for compressor and multiplier design. The superior performance is observed for scalable full adders on extending up to 8-bit RCA with full swing output. At 1-V supply, the Scalable full adder has obtained up to 56.63% reduced power, up to 74.47% reduced delay, up to 88.9% reduced PDP, and up to 90% reduced EDP as compared with other implemented RCAs. With the drawn layout, the area of the 8-bit scalable adder is calculated. The calculated area is 592.62  $um^2$  and Fig. 19 depicts the layout. Thus, the scalable adder is highly suitable as the 3-2 compressor and higher-order compressor implementations because the 1-bit adder is the basic unit in these circuits.

## B. Compressor Topologies

The 4-2, 5-2, and 3-1-1-2 compressors are implemented using a scalable full adder due to their superior performance. The 3-2 Compressor is a simple 1-bit full adder. In this work, the 3-2 compressor design is a 1-bit scalable full adder, depicted in Fig.3. Table 2 depicts the results of the 3-2, 4-2, 5-2, and 3-1-1-2 compressors designed with a scalable adder.

From the results observed in Table 2, the 4-2 compressor tends to give the optimum performance in terms of higherorder compressors. The area is calculated by the layout of 3-2 compressors. The area occupied by the 3-2 Compressor design is  $62.554 \mu m2$ . It can be observed that the designed 3-2 compressor requires a small area. The layout is shown in pictorial form in Fig.20.

## C. 4x4 Vedic Multiplier

The different architectures of Vedic multipliers and other multiplier designs such as Binary, Braun, and Array multipliers have been designed with scalable adders. The multipliers are simulated for different VDD values ranging

TABLE II. Performance parameters of implemented compressor topologies.

| Compressor Design with Scalable FA | Power (µW) | Delay (ps) | PDP (aJ) | EDP(10 <sup>-27</sup> Js) |
|------------------------------------|------------|------------|----------|---------------------------|
| 3-2 Compressor                     | 0.137      | 66.68      | 9.13     | 0.60                      |
| 4-2 Compressor                     | 0.264      | 107.1      | 28.6     | 3.03                      |
| 5-2 Compressor                     | 0.571      | 91.08      | 52       | 4.74                      |
| 3-1-1-2 Compressor                 | 0.547      | 116.5      | 63.7     | 7.42                      |



Figure 21. Simulation waveform of 4x4 Vedic multiplier



Figure 22. Delay analysis of implemented 4x4 multiplier topologies

from 0.8-1.2 V and a comparison is made. The power consumption by different multiplier topologies is listed in table 3. It is apparent that Vedic multipliers operate with less power. The comparative analysis reveals that the proposed design shows up to a 94% reduced average power as compared with the multiplier design in [10-17] at a 1-V supply. The results of the delay of various multipliers designed are shown in Table 4 and Fig. 22. The proposed design shows up to 67% reduced delay, 97% reduced PDP, up to 98% reduced EDP as compared with other implemented Vedic multiplier designs at 1-V supply. All these results are supported by the data observed in Tables 3, 4, 5, and 6 respectively. Fig.21 depicts the general output simulation waveform of the 4x4 Vedic multiplier where A3-A0 and B3-B0 are the 4-bit multiplicands and P7-P0 is the 8-bit product output.

It is observed that the best performance metric of the proposed 4x4 Vedic multiplier is its speed. Thus, Figure 22 shows the graphical representation of all the implemented multipliers with respect to the supply voltage variation from 0.8V-1.2V.

The results confirm that the Vedic multiplier provides an exemplary performance. To determine the robustness of the proposed circuit, Monte Carlo simulations of the proposed multiplier are done with 200 runs to account for the delay.



Figure 23. Delay analysis of implemented 4x4 multiplier topologies

The respective values of delay analysis are depicted in Table 7. The Monte Carlo histogram of the delay of the proposed 4x4 Vedic multiplier is depicted in Fig.23. The proposed design has less variation in delay distribution.

Fig. 23. Monte Carlo histogram of delay with 200 samples Proposed design The corner analysis is also performed for delay for all the process corners, namely SS, SF, FS, FF, and nominal corners. The different values of delay observed at the corners are obtained and the results are tabulated in Table 8.

### D. 8x8 Vedic Multiplier

The performance parameters of the proposed 8x8 Vedic multiplier are compared with the implemented 8x8 Vedic multipliers presented in [5] and [12]. The results of the simulation of the implemented 8-bit Vedic multipliers at a 1-V supply are depicted in Table 9. Apart from the relevant performance parameters like power, delay, PDP and EDP, the 8x8 multipliers area is compared in terms of transistor count (TC). From Table 9, it can be concluded that the proposed 8x8 Vedic multiplier requires up to 27% fewer transistors and therefore occupies the minimum area.

The results in Table 9 confirm that the proposed 8x8 Vedic multiplier depicts 13.31% and 31% reduced PDP as compared with the implemented 8x8 Vedic multiplier Design 1 and Design 2 discussed in reference [5] and [12]. The results in Table 9 reveal that the proposed design corresponds to an exemplary performance. The Monte Carlo simulation is done for all three designed 8x8 multipliers with 30 samples, as shown in Fig. 24. With the results in table 10, it is determined that the proposed design undergoes less variation under process corners in delay distribution.

| Supply<br>Voltage<br>(V) | Array<br>Multiplier | Braun<br>Multiplier | Binary<br>Multiplier | VM 1  | VM 2  | VM 3  | VM 4  | VM 5  | Proposed<br>Vedic<br>Multiplier |
|--------------------------|---------------------|---------------------|----------------------|-------|-------|-------|-------|-------|---------------------------------|
| 0.8                      | 8.850               | 6.277               | 0.948                | 15.11 | 1.023 | 0.987 | 0.807 | 1.106 | 0.880                           |
| 0.9                      | 16.94               | 11.283              | 1.212                | 28.78 | 1.181 | 1.140 | 1.197 | 1.511 | 1.125                           |
| 1.0                      | 27.92               | 18.961              | 1.513                | 39.99 | 1.482 | 1.797 | 1.495 | 1.749 | 1.404                           |
| 1.1                      | 41.59               | 27.288              | 1.857                | 62.41 | 1.520 | 2.234 | 1.524 | 2.027 | 1.856                           |
| 1.2                      | 57.77               | 41.712              | 2.307                | 87.84 | 2.478 | 3.321 | 1.935 | 4.202 | 2.339                           |

TABLE III. Power consumption  $(\mu W)$  of implemented 4x4 multiplier topologies.

TABLE IV. Delay(ps) of implemented 4x4 multiplier topologies.

| Supply<br>Voltage<br>(V) | Array<br>Multiplier | Braun<br>Multiplier | Binary<br>Multiplier | VM 1  | VM 2  | VM 3  | VM 4  | VM 5  | Proposed<br>Vedic<br>Multiplier |
|--------------------------|---------------------|---------------------|----------------------|-------|-------|-------|-------|-------|---------------------------------|
| 0.8                      | 741.9               | 884.5               | 501.4                | 369.4 | 372.2 | 362.8 | 356.5 | 442.7 | 255.3                           |
| 0.9                      | 450.9               | 574.1               | 335.3                | 261.7 | 223.4 | 265.6 | 306.5 | 329.4 | 199.1                           |
| 1.0                      | 275.6               | 426.5               | 248.7                | 195   | 190.8 | 213.7 | 220.8 | 215.4 | 142.8                           |
| 1.1                      | 197                 | 353.2               | 203.1                | 96.7  | 165.4 | 164.6 | 142.4 | 160.6 | 106                             |
| 1.2                      | 142.8               | 112.1               | 150.7                | 67    | 139.2 | 130.6 | 109.7 | 90.92 | 70.35                           |

TABLE V. PDP (aJ ) of implemented 4x4 multiplier topologies.

| Supply<br>Voltage<br>(V) | Array<br>Multiplier | Braun<br>Multiplier | Binary<br>Multiplier | VM 1 | VM 2  | VM 3  | VM 4   | VM 5  | Proposed<br>Vedic<br>Multiplier |
|--------------------------|---------------------|---------------------|----------------------|------|-------|-------|--------|-------|---------------------------------|
| 0.8                      | 6565                | 4656                | 475.3                | 5581 | 262.3 | 358   | 287.6  | 489.6 | 224.6                           |
| 0.9                      | 7638                | 6477                | 406.3                | 7531 | 263.8 | 302.7 | 366.88 | 497.7 | 223.9                           |
| 1.0                      | 7694                | 8086                | 372                  | 7670 | 282.7 | 384   | 330    | 376.7 | 200.4                           |
| 1.1                      | 8193                | 9635                | 375.5                | 6035 | 251.4 | 367.7 | 217    | 325.5 | 196.7                           |
| 1.2                      | 8249                | 4674                | 345                  | 5885 | 187.3 | 433.7 | 212.2  | 382   | 164.5                           |

TABLE VI. EDP(10<sup>-27</sup>Js) of implemented 4x4 multiplier topologies.

| Supply<br>Voltage<br>(V) | Array<br>Multiplier | Braun<br>Multiplier | Binary<br>Multiplier | VM 1  | VM 2  | VM 3   | VM 4   | VM 5   | Proposed<br>Vedic<br>Multiplier |
|--------------------------|---------------------|---------------------|----------------------|-------|-------|--------|--------|--------|---------------------------------|
| 0.8                      | 4870                | 4118                | 238.3                | 2061  | 97.62 | 129.88 | 102.52 | 216.74 | 57.34                           |
| 0.9                      | 3443                | 3718                | 136.23               | 1970  | 58.93 | 80.39  | 112.44 | 163.94 | 44.57                           |
| 1.0                      | 2120                | 3448                | 92.51                | 1496  | 53.93 | 82     | 72.88  | 60.49  | 28.61                           |
| 1.1                      | 1614                | 3403                | 76.26                | 583.5 | 41.58 | 60.52  | 30.9   | 52.27  | 20.85                           |
| 1.2                      | 1177                | 523.9               | 51.99                | 394.2 | 26.07 | 56.64  | 23.27  | 34.73  | 11.57                           |

TABLE VII. Montecarlo Simulation Results for delay of implemented Vedic Multiplier topologies.

| Multiplier circuit        | Min     | Max     | Mean(µ) | Std. Dev() |
|---------------------------|---------|---------|---------|------------|
| Vedic Multiplier 1        | 37.4ns  | 136.7ps | 3.45 n  | 11.02n     |
| Vedic Multiplier 2        | 30.25ns | 216.4ps | 143.66p | 13.45n     |
| Vedic Multiplier 3        | 129.4ps | 173.6ps | 145.7p  | 6.66p      |
| Vedic Multiplier 4        | 171.6ps | 359.7ps | 227.8p  | 31.08p     |
| Vedic Multiplier 5        | 50.4ns  | 24.68ns | 12.66n  | 27.58n     |
| Proposed Vedic Multiplier | 90.53ps | 131.9ps | 111.8p  | 8.39p      |

http://journals.uob.edu.bh

| Corners | Vedic<br>Multiplier<br>1 | Vedic<br>Multiplier<br>2 | Vedic<br>Multiplier<br>3 | Vedic<br>Multiplier<br>4 | Vedic<br>Multiplier<br>5 | Proposed<br>Vedic Multiplier |
|---------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|------------------------------|
| SS      | 146p                     | 30.24n                   | 349.5p                   | 372.9p                   | 381.7p                   | 204.1p                       |
| SF      | 128.2p                   | 30.29n                   | 213.8p                   | 203p                     | 266.1p                   | 150.1p                       |
| FS      | 95.67p                   | 156.8p                   | 212.9p                   | 265.7p                   | 296p                     | 138.2p                       |
| FF      | 91.94p                   | 30.3n                    | 144.6p                   | 141.2p                   | 202.3p                   | 99.73p                       |
| Nominal | 111.5p                   | 30.27n                   | 213.7p                   | 220.9p                   | 272.5p                   | 142.8p                       |

TABLE VIII. Corner analysis of delay(s)

| TABLE IX. Perf | ormance parameters | of implemented | 8x8 multipliers |
|----------------|--------------------|----------------|-----------------|
|----------------|--------------------|----------------|-----------------|

| Design                        | ТС   | Power (µW) | Delay (ps) | PDP (aJ) | EDP (10 <sup>-24</sup> Js) |
|-------------------------------|------|------------|------------|----------|----------------------------|
| 8x8 multiplier Design 1       | 2736 | 9.181      | 515.76     | 4.73     | 2.43                       |
| 8x8 multiplier Design 2       | 2856 | 13.26      | 445.29     | 5.9      | 2.62                       |
| Proposed 8x8 Vedic multiplier | 1976 | 9.023      | 455.24     | 4.10     | 1.87                       |

TABLE X. Monte Carlo analysis results of delay of implemented 8x8 multipliers.

| Design                           | Min     | Max     | Mean<br>(µ) | Std. Dev. |
|----------------------------------|---------|---------|-------------|-----------|
| 8x8 multiplier<br>Design 1       | 860.2ps | 1911ps  | 1181        | 253       |
| 8x8 multiplier<br>Design 2       | 400.5ps | 575.7ps | 462.9       | 39.24     |
| Proposed 8x8<br>Vedic multiplier | 395.3ps | 568.5ps | 453.3       | 40.15     |



Figure 24. Monte Carlo histogram of delay of the proposed  $8 {\rm x} 8$  multiplier

The results of the corner analysis of delay from SS-Nominal Corners are in Table 11. The results validate the robustness of the circuit.

The area of the proposed 8-bit multiplier is calculated from the drawn respective layout as depicted in Fig.25. The calculated area is 2204.9  $um^2$ .

## 5. CONCLUSION

Multipliers are essential components for image manipulation in image processing. In this paper, a new compressorbased 4x4 Vedic multiplier and an 8x8 Vedic multiplier have been proposed for utilization in image processing. The superior performance of the proposed multipliers is established by the implementation and analysis of eight other multiplier TABLE XI. Results of corner analysis of delay of implemented 8x8 multipliers

| PROCESS | 8x8 multiplier | 8x8 multiplier | Proposed 8x8   |
|---------|----------------|----------------|----------------|
| CORNERS | Design 1(ps)   | Design 2(ps)   | Multiplier(ps) |
| SS      | 749.5          | 669.4          | 661.7          |
| SF      | 492.6          | 411.3          | 430.8          |
| FS      | 590.7          | 505.9          | 512.1          |
| FF      | 394.4          | 311.2          | 329.3          |
| NOMINAL | 515.8          | 445.3          | 455.2          |



Figure 25. The layout of the 8-bit Vedic multiplier

topologies available in the literature. All the circuits in this paper are implemented using Cadence Virtuoso at a 45nm technology node, at room temperature, over a voltage range of 0.8V-1.2V. Exhaustive groundwork has been done to design the proposed multipliers. Firstly, three efficient 1-bit adders, namely, CPL, GDI 1 and scalable adder, are implemented and extended to an 8-bit RCA structure. The performance analysis indicates the scalable adder as the most efficient adder, which is then used to implement higher-order compressors like 4-2,5-2,3-1-1-2. Since the 4-2 compressor is the fastest higher-order topology, it is incorporated into the proposed 4x4 Vedic multiplier along with a 3-2 compressor (1-bit scalable adder). This 4x4 multiplier is compared under similar conditions with eight other multiplier topologies, and the proposed 4x4 Vedic multiplier gives the best performance in terms of power, delay, PDP and EDP. When compared to the implemented multiplier designs, the proposed 4x4 Vedic multiplier design



has an average power reduction of 94%, a delay reduction of 67%, a PDP reduction of 97%, and an EDP reduction of 98%. This proposed 4x4 Vedic multiplier is also extended to an 8x8 Vedic multiplier by using 4 counts of the proposed 4x4 Vedic multiplier and 3 counts of 8-bit RCA s. The proposed 8x8 Vedic multiplier is compared with two other 8x8 Vedic multipliers to establish its superior performance. The proposed 8x8 Vedic multiplier depicts 13.3% and 31% reduced PDP as compared with the implemented 8x8 Vedic multiplier Design 1 and Design 2 . Apart from that to establish the robustness of the proposed Vedic Multipliers, the Monte Carlo, as well as Process corner analysis, is done. All the favourable results indicate its usability for other applications. amsmath

#### References

- [1] S. Sathiyapriya and C. Manikandababu, "Design and analysis of approximate multiplier for image processing application," in *Advances in Smart System Technologies*. Springer, 2021, pp. 209–221.
- [2] R. Zimmermann, W. Fichtner *et al.*, "Low-power logic styles: Cmos versus pass-transistor logic," *IEEE journal of solid-state circuits*, vol. 32, no. 7, pp. 1079–1090, 1997.
- [3] M. Shoba and R. Nakkeeran, "Gdi based full adders for energy efficient arithmetic applications," *Engineering Science and Technology, an International Journal*, vol. 19, no. 1, pp. 485–496, 2016. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S2215098615001512
- [4] M. Hasan, M. J. Hossein, M. Hossain, H. U. Zaman, and S. Islam, "Design of a scalable low-power 1-bit hybrid full adder for fast computation," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 67, no. 8, pp. 1464–1468, 2020.
- [5] A. Chudasama, T. N. Sasamal, and J. Yadav, "An efficient design of vedic multiplier using ripple carry adder in quantum-dot cellular automata," *Computers Electrical Engineering*, vol. 65, pp. 527–542, 2018. [Online]. Available: https://www.sciencedirect. com/science/article/pii/S0045790617330495
- [6] S. Anjana, C. Pradeep, and P. Samuel, "Synthesize of high speed floating-point multipliers based on vedic mathematics," *Procedia Computer Science*, vol. 46, pp. 1294–1302, 2015, proceedings of the International Conference on Information and Communication Technologies, ICICT 2014, 3-5 December 2014 at Bolgatty Palace Island Resort, Kochi, India. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S1877050915000551
- [7] K. Yugandhar, V. G. Raja, M. Tejkumar, and D. Siva, "High performance array multiplier using reversible logic structure," in 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), 2018, pp. 1–5.
- [8] N. Kandasamy, F. Ahmad, S. Reddy, R. B. M, N. Telagam, and S. Utlapalli, "Performance evolution of 4-b bit mac unit using hybrid gdi and transmission gate based adder and multiplier circuits in 180 and 90nm technology," *Microprocessors and Microsystems*, vol. 59, pp. 15–28, 2018. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0141933117303174
- [9] A. A. Bawaskar, V. Alagdeve, and R. Keote, "High performance redundant binary multiplier," in 2016 International Conference on Communication and Signal Processing (ICCSP), 2016, pp. 1277– 1281.

- [10] S. Ms.Dharani, M. Satheesan, M. Asuvanti, R. Kumar, and V. Shanmugam, "Design and analysis of high-speed low-power vedic multiplier with 3-1-1-2 compressor using reversible logic gates," *IOP Conference Series: Materials Science and Engineering*, vol. 1059, p. 012024, 02 2021.
- [11] G. R. Gokhale and P. D. Bahirgonde, "Design of vedic-multiplier using area-efficient carry select adder," in 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2015, pp. 576–581.
- [12] K. Sivanandam and P. Kumar, "Design and performance analysis of reconfigurable modified vedic multiplier with 3-1-1-2 compressor," *Microprocessors and Microsystems*, vol. 65, pp. 97–106, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/ pii/S0141933116302526
- [13] T. Kong and S. Li, "Design and analysis of approximate 4–2 compressors for high-accuracy multipliers," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 29, no. 10, pp. 1771–1781, 2021.
- [14] A. Pishvaie, G. Jaberipur, and A. Jahanian, "Redesigned cmos (4; 2) compressor for fast binary multipliers," *Canadian Journal of Electrical and Computer Engineering*, vol. 36, no. 3, pp. 111–115, 2013.
- [15] M. Ha and S. Lee, "Multipliers with approximate 4–2 compressors and error recovery modules," *IEEE Embedded Systems Letters*, vol. 10, no. 1, pp. 6–9, 2018.
- [16] P. J. Edavoor, S. Raveendran, and A. D. Rahulkar, "Approximate multiplier design using novel dual-stage 4:2 compressors," *IEEE Access*, vol. 8, no. 2, pp. 48337–48351, 2020.
- [17] A. G. M. Strollo, E. Napoli, D. De Caro, N. Petra, and G. D. Meo, "Comparison and extension of approximate 4-2 compressors for low-power approximate multipliers," *IEEE Transactions on Circuits* and Systems I: Regular Papers, vol. 67, no. 9, pp. 3021–3034, 2020.
- [18] V. Bianchi and I. De Munari, "A modular vedic multiplier architecture for model-based design and deployment on fpga platforms," *Microprocessors and Microsystems*, vol. 76, p. 103106, 2020. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S0141933120302738
- [19] R. Gupta, R. Dhar, K. L. Baishnab, and J. Mehedi, "Design of high performance 8 bit vedic multiplier using compressor," in 2014 International Conference on Advances in Engineering and Technology (ICAET), 2014, pp. 1–5.
- [20] M. Shoba and R. Nakkeeran, "Energy and area efficient hierarchy multiplier architecture based on vedic mathematics and gdi logic," *Engineering Science and Technology, an International Journal*, vol. 20, no. 1, pp. 321–331, 2017. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S2215098616303202
- [21] H. Kaur and N. R. Prakash, "Area-efficient low pdp 8-bit vedic multiplier design using compressors," in 2015 2nd International Conference on Recent Advances in Engineering Computational Sciences (RAECS), 2015, pp. 1–4.
- [22] R. Marimuthu, Y. E. Rezinold, and P. S. Mallick, "Design and analysis of multiplier using approximate 15-4 compressor," *IEEE Access*, vol. 5, pp. 1027–1036, 2017.
- [23] M. N. Chandrashekara and S. Rohith, "Design of 8 bit vedic



multiplier using urdhva tiryagbhyam sutra with modified carry save adder," in 2019 4th International Conference on Recent Trends on Electronics, Information, Communication Technology (RTEICT), 2019, pp. 116–120.

- [24] M. Jhamb, Garima, and H. Lohani, "Design, implementation and performance comparison of multiplier topologies in power-delay space," *Engineering Science and Technology, an International Journal*, vol. 19, no. 1, pp. 355–363, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/ pii/S2215098615001287
- [25] G. C. Ram, Y. R. Lakshmanna, D. S. Rani, and K. B. Sindhuri, "Area efficient modified vedic multiplier," in 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT), 2016, pp. 1–5.
- [26] B. HOLDSWORTH and R. WOODS, "5 combinational logic design with msi circuits," in *Digital Logic Design (Fourth Edition)*, fourth edition ed., B. HOLDSWORTH and R. WOODS, Eds. Oxford: Newnes, 2002, pp. 105–141. [Online]. Available: https:// www.sciencedirect.com/science/article/pii/B9780750645829500068
- [27] S. S. Meti, C. Bharath, Y. Praveen Kumar, and B. Kariyappa, "Design and implementation of 8-bit vedic multiplier using mgdi technique," in 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017, pp. 1923– 1927.
- [28] D. Tripathi and S. Wairya, "An energy dissipation and cell optimization of vedic multiplier topologies for nanocomputing applications," *Turkish Journal of Computer and Mathematics Education (TURCO-MAT)*, vol. 12, no. 14, pp. 1490–1510, 2021.



Aishita Verma Aishita Verma is a M. Tech scholar at IET Lucknow. She has done her B. Tech in Electronics and Communication Engineering from SRMCEM, Lucknow. Her current research interests include VLSI Design, Digital Circuits and Fast Computing Applications.



Anum Khan Er. Anum Khan got her Bachelor's degree in Electronics and Telecommunication Engineering from Nagpur University in 2013. She completed her M.Tech in VLSI Design from Indira Gandhi Delhi Technical University in 2016. She is currently pursuing Ph.D. from the IET,Lucknow in Low power digital circuits in nanotechnology.



**Subodh Wairya** Dr. Subodh Wairya did his B.Tech from HBTI, Kanpur in Electronics. He received his M.Tech (Telecommunication) from Jadavpur University, Kolkata. He has done his Ph.D. in the field of Electronics from MNNIT, Allahabad, and is currently serving as Professor at the IET, Lucknow.