

# Approximate Adder: Lower-Part or Adder

# K. Varnika



Abstract: High-performance VLSI systems are essential in real-time applications, in order to increase the performance of the VLSI systems, an approximate computing technique is followed where the performance of the circuit is enhanced by trading off it with a slight loss in the accuracy. These approximate circuits are used in error-tolerant applications, where output need not be accurate. This paper concentrates mainly on approximate adders, as they are major building blocks of DSP systems. The analysis of the Lower-part OR Adder for 4-bit addition and comparison of it with the precise adder i.e., Ripple Carry Adder using the mentor graphics tool in 90 nm CMOS technology are presented in this paper. Our experimental results show that there is 17%-70% savings in power dissipation, 4%-32% saving in the area, and 19%-84% savings in time due to approximate adder. As the LOA-2 and LOA-3 are performing optimally these two adders can be used for error-tolerant applications and based on the requirement LOA-2 or LOA-3 can be selected.

Keywords: Approximate Computing, Approximate Adder, Lower-part OR Adder, VLSI.

#### I. INTRODUCTION

Almost all of the applications which use VLSI systems, demand for a power or area efficiency, or high performance, digital signal processing is one such application. Among all the blocks of the DSP system, adder is the important block which is also used in other operations such as subtraction, multiplication, and division. So, it is important to reduce the power dissipation, area occupancy, and delay of the adder which in-turn reduces them for the other operations and the overall system. In multi-media applications, DSP blocks are one of the important blocks where the outcome can be either image or a video. As there is the perceptual limitation of the human vision, the complete information/details from the image or a video cannot be noticed. So the outcome can be approximate rather than accurate which can be performed through approximate computing. Approximate computing [1] is a technique where the lesser power dissipation and less area occupancy with higher performance of the system are acquired by trading off them for its accuracy. Using this technique, various approximate adders are designed and various approximate computing techniques [2] are used in the literature. In VLSI, this reduces the number of gates or transistors to reduce accuracy when compared to the accurate ones. Approximate adders can be designed based on two methodologies.

## Revised Manuscript Received on May 30, 2020.

\* Correspondence Author

**K. Varnika\***, Department of ECE, G. Narayanamma Institute of Technology and Sciences, India. Email: varnika.kakularam@gmail.com

© The Authors. Published by Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

The first methodology is to design a single-bit approximate full adder by reducing the number of gates or transistors which gives the shorter critical path with lesser power dissipation, area occupancy, and higher speed than the accurate adders. This approximate full adder is used in the lower significant bit positions only and in most significant bit positions accurate adder is used to ensure that the outcome of the adder is acceptable.

A-few other adders in the literature that are based on this methodology are in [3,4] and also it improves the power dissipation, delay, and area to a great extent at the expense of a reduction in the accuracy. The second methodology is to design a block-based approximate multi-bit adder where the adder is divided into several smaller overlapping multi-bit sub-adder blocks which are formed by accurate designs. The carry propagation chain in these approximate adders will be smaller so these adders will have high speed, less power dissipation, and area occupancy. In this paper, the Lower-part OR Adder is analyzed which is designed based on the first methodology. Also, LOA is compared with the precise adder (RCA).

## II. LITERATURE SURVEY

Over the past decade, several researches have been carried out on approximate adders. In [5], a VLSA (Variable Latency Speculative Adder) was proposed which is fast but inaccurate adder is proposed which is based on the idea that the average length of the longest sequence of propagating signals is approximately log n, where n is the bit-width of the two integers to be added.

In [6], ETA-I (Error-Tolerant Adder I) was proposed where it utilizes a modified XOR gate to compute sums for the least significant bits while the most significant bits are obtained by accurate adders. While adding two small addends it results in relatively large errors. To solve this small input problem, an enhancement of ETA-I design was proposed by the same authors which is called ETA-II in [7]. In this, the authors divided the propagation path of carry signal into a number of short paths by dividing the adder into k-bit segments which will speed up the performance and reduce the dynamic power.

An error-tolerant adder IV, enhancement of ETA-II, is proposed in [8] where the accuracy and delay are improved which operates by splitting the input operands into accurate and inaccurate parts. Shin and Gupta [9] and Phillips et al.[10] have proposed logic complexity reduction by Karnaugh map simplification. Some other works where logic complexity reduction is performed at a gate-level are proposed in [11], [12]. Some other works which use complexity reduction at the algorithm level to meet real-time energy constraints are proposed in [13], [14].



# **Approximate Adder: Lower-Part or Adder**

An Accuracy-Configurable Approximate (ACA) adder is introduced in [15] which supports both accurate and inaccurate adders for computation and uses several sub-adders to calculate the partial sums and

it can be said that it occupies a larger area and has high power consumption. Approximate adders at the transistor level are proposed in [3] called approximate mirror adders and applied in low-power digital signal processing.

Another transistor-level adder is proposed in [4] called Approximate XOR/XNOR-based Adders. An accuracy configurable adder is proposed to provide better trade-off and is validated by using Xilinx V6 FPGA in [16].

In this work, the LOA is analyzed in a mentor graphics tool with 90nm CMOS technology which is designed based on the first methodology and its architecture is similar to that of the error-tolerant adder (ETA).

## III. APPROXIMATE ADDER

## A. Lower-Part OR Adder

The Lower-part OR Adder (LOA) is the approximate adder where the 'n' of bit adder is divided into 2 sub-adders i.e., ' $n_l$ ' and ' $n_h$ ' where 'l' denotes the lower bits or trailing bits and 'h' denotes the higher bits or header bits. The  $n_l$  sub-adder consists of approximate full adders and  $n_h$  sub-adder consists of precise full adders. The approximate full adder consists of OR gate and AND gate where OR gate is to generate sum and AND gate is to generate carry only in

the  $n_l^{th}$  bit position and this complete architecture is shown in figure 1.

As the carry is not considered in the trailing part of this adder, the adder performs approximately but it is decently efficient because the approximation is limited to the trailing bits of the adder, the error due to approximation is in the LSB part as the LSB contains less information. The number of errors in sum is '4' for a single bit approximate adder whereas in carry there are '2' errors and the maximum error distance is '2' as shown in Table 1 below.

Table 1 Truth Table for precise adder and approximate adder for 1-bit addition

| Inputs |   |   | Precise Adder |       | Approximate |     |    |
|--------|---|---|---------------|-------|-------------|-----|----|
|        |   |   |               |       | Adder (LOA) |     |    |
| A      | В | C | SUM           | CARRY | OR          | AND | ED |
| 0      | 0 | 0 | 0             | 0     | 0           | 0   | 0  |
| 0      | 0 | 1 | 1             | 0     | 0           | 0   | 1  |
| 0      | 1 | 0 | 1             | 0     | 1           | 0   | 0  |
| 0      | 1 | 1 | 0             | 1     | 1           | 0   | 2  |
| 1      | 0 | 0 | 1             | 0     | 1           | 0   | 0  |
| 1      | 0 | 1 | 0             | 1     | 1           | 0   | 2  |
| 1      | 1 | 0 | 0             | 1     | 1           | 1   | 1  |
| 1      | 1 | 1 | 1             | 1     | 1           | 1   | 0  |

For the n-bit LOA, as the number of or gates increasing in the lower-part, the power dissipation, area, and delay will be decreasing while there will be an increase in imprecision which is discussed in the results and discussion section.



Figure 1 Lower-part OR Adder Architecture

#### **B.** Evaluation Metrics

The approximate adders can be evaluated and compared with the precise adder using many evaluation metrics but in this work average error and error distance are considered.

Error distance is said to be the distance between the imprecise result and the precise result. If the precise adder output is '000' and the approximate adder output is '001', then the error distance is 1 whereas error distance is 2 if the approximate adder output is '010'. The error distance equation is given below and where 'a' is the imprecise result and 'b' is the precise result.

$$ED = |a - b| = \left| \sum_{i} a[i] 2^{i} - \sum_{j} b[j] 2^{j} \right|$$
 (1)

Error is defined as the difference between approximate result and the precise result as shown in the equation (2) below.

$$\varepsilon = \tilde{S} - S \tag{2}$$

An average error can be defined as the expectation of the error as shown in the equation (3, 4) below.

$$\mu = E[\varepsilon] \tag{3}$$

$$\mu_T = \sum_{i=0}^{n_l} \widehat{\mu_i} 2^i \tag{4}$$

where  $\mu_T$  is the average error of the template.





#### IV. RESULTS AND DISCUSSION

The Lower-part OR Adder is implemented in the Mentor Graphics tool with 90nm CMOS technology where the approximate adder circuits and precise adder circuit are described in Verilog HDL.

The 4-bit approximate adder which is LOA, with the different number of or gates in the LSB is compared with the 4-bit precise adder by measuring the following: power, delay, and area.

The layouts of the LOA, with the different number of or gates at the LSB, and the accurate adder are shown in figure 3. It is noticeable that the complexity of the accurate adder is high when compared to the lower-part or adders and also since the number of the or gates in the trailing bits is increased, the complexity is reduced. LOA-4 has less complexity when compared to the other approximate adders and the precise adder. Also, the area occupied by the precise adder is relatively more than the approximate adders. As the number of or gates are increasing in the trailing bits, the area of the LOA is decreasing.

Table 2 shows the comparison of the LOA with the RCA in terms of area, delay and power dissipation where the maximum error in LOA will be 608.93 for LOA-4 and minimum error will be 250.50 for LOA-1 but the maximum

area occupancy will be  $1183.775~\mu m^2$  for LOA-1 and minimum area occupancy will be  $839.495~\mu m^2$  for LOA-4 whereas the maximum delay will be 27.441~ns for LOA-1 and minimum delay will be 5.296~ns for LOA-4. The maximum power dissipated by the approximate adder will be 314.514~nW for LOA-1 and the minimum power dissipated by approximate adder will be 113.063~nW for LOA-4. Hence, LOA-2 and LOA-3 are optimal.

The table 2 also shows the percentage of improvement in the area, delay and power dissipation of LOA with RCA where the improved area percentage will be from 4.51% to 32.28% and the improved delay percentage will be from 19.06% to 84.37% and the improved power dissipation percentage will be from 17.14% to 70.21%.

From figure 4, which shows the graphical representation of the power dissipation, delay, and area from table 2, it can be seen that as the number of OR gates in the LSB are increasing or as the approximation is increasing, the average error increases and the area, delay and power dissipation decreases.

From figure 5, which shows the graphical representation of the metrics improvement shown in table 2, it can be seen that as the number of OR gates in the LSB are increasing or as the approximation is increasing, the percentage of improvement in the area, delay, and power dissipation also increases.

Table 2 Evaluation Metrics of LOA and their percentage of improvement with respect to RCA

| Circuit | Average<br>Error | Area<br>(μm²) | Improved<br>Area (%) | Delay<br>(ns) | Improved<br>Delay (%) | Power Dissipation (nW) | Improved Power Dissipation (%) |
|---------|------------------|---------------|----------------------|---------------|-----------------------|------------------------|--------------------------------|
| RCA     | -                | 1239.740      | -                    | 33.905        | -                     | 379.537                | -                              |
| LOA-1   | 250.50           | 1183.775      | 4.51                 | 27.441        | 19.06                 | 314.514                | 17.14                          |
| LOA-2   | 320.12           | 1061.250      | 14.39                | 20.057        | 40.84                 | 247.364                | 34.83                          |
| LOA-3   | 427.16           | 986.760       | 20.40                | 12.674        | 62.61                 | 180.213                | 52.52                          |
| LOA-4   | 608.93           | 839.495       | 32.28                | 5.296         | 84.37                 | 113.063                | 70.21                          |



Figure 3(a) Layout of 4-bit RCA



Figure 3(b) Layout of LOA-1







Figure 3(c) Layout of LOA-2

Figure 3(d) Layout of LOA-3



Figure 3(e) Layout of LOA-4



Figure 5 Percentage of improvement in the area, delay, and power dissipation with respect to RCA



Figure 4 Evaluation Metrics of LOA with respect to RCA



# V. CONCLUSION

In this paper, the Lower-part OR Adder has been analyzed for 4-bit addition and compared it with the 4-bit Ripple Carry Adder. It has been observed that the LOA outperforms the RCA in terms of power, delay, and area by the trading of its accuracy. The amount of power dissipation saved due to LOA is 17%-70%, based on the number of or gates in the LSB part. The amount of time saved by the use of LOA is 19%-84%, based on the number of or gates in the LSB part. The amount area occupied by the adder is reduced to around 4%-32%. As the LOA-2 and LOA-3 show the optimal power dissipation, delay, and area occupancy, they can be used in applications which are error-tolerant like image processing, signal processing, data analytics, etc., based on the requirements. In the future, this approximate adder can be enhanced and can be applied to error-tolerant applications.

#### REFERENCES

- J. Han and M. Orshansky, "Approximate computing: an emerging paradigm for energy-efficient design," in ETS'13, May 2013.
- R. Zimmermann, "Binary adder architectures for cell-based VLSI and their synthesis," Ph.D. dissertation, Fed. Inst. Technol., Zürich, Switzerland, 1998.
- V. Gupta, D. Mohapatra, A. Raghunathan and K. Roy, "Low-Power Digital Signal Processing Using Approximate Adders," IEEE Trans. CAD of Integrated Circuits and Systems, 32(1), pp. 124-137, 2013.
- Z. Yang, A. Jain, J. Liang, J. Han, F. Lombardi, "Approximate xor/xnor-based adders for inexact computing," in: Proc. IEEE International Conference on Nanotechnology, IEEE-NANO13), 2013, pp. 690–693.
- A. K. Verma, P. Brisk, and P. Ienne, "Variable latency speculative addition: A new paradigm for arithmetic circuit design," in Proc. Design, Automat. Test Eur., 2008, pp. 1250–1255.
- N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo and Z. H. Kong: "Design of Low- Power High-Speed Truncation-Error-Tolerant Adder and Its Application in Digital Signal Processing", IEEE Transactions on Very Large Scale Integration Systems, vol 18, no. 8, pp. 1225–1229, 2010.
- N. Zhu, W. L. Goh and K. S. Yeo: "An enhanced low-power highspeed Adder For Error-Tolerant application", Proceedings of the 12th International Symposium on Integrated Circuits, pp. 69–72, 2009.
- N. Zhu, W. L. Goh, G. Wang and K. S. Yeo, "Enhanced low-power high-speed adder for error-tolerant application," 2010 International SoC Design Conference, Seoul, 2010, pp. 323-327.
- D. Shin and S. K. Gupta, "Approximate logic synthesis for error tolerant applications," in Proc. Design, Automat. Test Eur., 2010, pp. 057-060
- B. J. Phillips, D. R. Kelly, and B. W. Ng, "Estimating adders for a low density parity check decoder," Proc. SPIE, vol. 6313, p. 631302, Aug. 2006
- 11. D. Kelly and B. Phillips, "Arithmetic data value speculation," in Proc. Asia-Pacific Comput. Syst. Architect. Conf., 2005, pp. 353–366.
- D. Shin and S. K. Gupta, "A re-design technique for data path modules in error tolerant applications," in Proc. 17th Asian Test Symp., 2008, pp. 431–437.
- Y. V. Ivanov and C. J. Bleakley, "Real-time h.264 video encoding in software with fast mode decision and dynamic complexity control," ACM Trans. Multimedia Comput. Commun. Applicat., vol. 6, pp. 5:1–5:21. Feb. 2010.
- M. Shafique, L. Bauer, and J. Henkel, "enBudget: A run-time adaptive predictive energy-budgeting scheme for energy-aware motion estimation in H.264/MPEG-4 AVC video encoder," in Proc. Design, Automat. Test Eur., Mar. 2010, pp. 1725–1730.
- A. B. Kahng and S. Kang: "Accuracy-configurable adder for approximate arithmetic designs", Design Automation Conference, pp. 820–825, 2012.
- M. Shafique, W. Ahmad, R. Hafiz, J. Henkel: "A low-latency generic accuracy configurable adder", Proceedings of Design Automation Conference, pp. 86:1-86:6, 2015.

# **AUTHOR PROFILE**



K. Varnika received a B.Tech degree in Electronics and Communications from Vidya Jyothi Institute of Technology, Hyderabad in 2018. Currently, she is pursuing M.Tech degree in Digital Electronics and Communications Engineering at G. Narayanamma Institute of Technology and Sciences, Hyderabad, India.

