# Design of Efficient Approximate Compressor for Digital Image Processing

#### Marimuthu. R, Elsie Rezinold, Mayank Rathi

Abstract: For various error tolerant applications like multimedia and signal processing, approximate computing is the most suited computing technique. With the cost of accuracy, approximate computing gives us faster and efficient results with possibly low power consumption. A new approach and design towards optimizing the partial products reduction stage of a compressor-based multiplier have been introduced in this paper. Two new designs of 4:2 compressors and six new designs of approximate multipliers using the approximate compressors have been proposed. The results of the simulation of the proposed designs show that there has been a significant improvement in the accuracy with reduction in power and time consumption when we compare to the previous approximate designs. An image processing application is used to prove the efficiency of the proposed designs.

Keywords: Approximate Compressors, Digital Image Processing, Edge Detection

#### I. INTRODUCTION

In various applications of Digital Image Processing and Digital Signal Processing, complex operators like convolution, correlation and filtering are required [1]. All these operators are power and time consuming. These operators use Multipliers, adders and shift registers for execution. Multiplier being the most power and time consuming component, it needs improvement in its performance to make the operation more efficient.

Multiplication process has basically three stages; (i) Partial Products generator, (ii) Partial products reducer and (iii) Final carry generator. For past decades, researches have been carried out to improve the performance of the multiplier. The second stage consumes more time and power and compared to the other two stages. The research has been more inclined towards optimizing the second stage of the multiplier. The partial products are generated by AND gates [2-4]. The conventional ways to reduce the partial products is using half adders, full adders and compressors [5-7]. The most commonly used compressor is 4-2 compressor. With the complexity of the multiplier, the power, time and area consumption increases. In today's world, many of the applications of a multiplier are error tolerant, i.e. the accuracy of the results is not the first priority.

#### Revised Manuscript Received on March 08, 2019.

**Marimuthu R**, School of Electrical Engineering, Vellore Institute of Technology, Vellore, India.

Elsie Rezinold, School of Electrical Engineering, Vellore Institute of Technology, Vellore, India.

Mayank Rathi, School of Electrical Engineering, Vellore Institute of Technology, Vellore, India.

Today's applications require more of efficient and less power consuming designs. Approximate designs give efficient and low power consuming results with the cost of accuracy. The most commonly used techniques for the generation of approximate arithmetic circuits are (i) Truncation, (ii) Voltage over scaling, and (iii) Simplification of logic complexity. Out of which the third one is the most commonly used as it is considered to be more efficient than the other two.

In this paper we have proposed two new designs of 4-2 compressors. According to the simulation results, the new designs of the compressor have given better results in power, delay and area if compared with the previous designs. We have used these new designs in an 8 by 8 multiplier. The multiplier gives further improvement in delay and power. We have used Image processing as the application to validate our results. We have applied Sobel gradient and Laplacian operator on the images using the new designs of the multipliers and have compared their results with that of the accurate multipliers. All our proposed designs are compiled with the Cadence RTL compiler and typical 90 nm library file is used to get the simulation results.

#### II. DESIGN OF 4:2 ACCURATE COMPRESSOR

A 4-2 compressor has five inputs and three outputs with the 5<sup>th</sup> input as the 'Cin' from the previous compressor and 3<sup>rd</sup> output as the 'Cout' for the next compressor. The internal circuit of the compressor consists of two full-adders. If X4-X1, Cin are the inputs, the first adder adds X0, X1 and the 'Cin' from the previous compressor. The carry of this compressor is the 'Cout' which is passed on to the next compressor. The second adder adds X2, X3 and sum of the first adder. The output of the full adder is the Sum and Carry of the compressor. Refer Fig.1.



Fig.1. Accurate 4-2 Compressor



#### **Design of Efficient Approximate Compressor for Digital Image Processing**

The outputs expressions for 'Carry', 'Cout' and 'Sum' are given below as:

$$Cout = (A \cdot B) | (C \cdot (A \oplus B))$$
 (1)

$$Carry = ((A \oplus B \oplus C) \cdot D) | (Cin \cdot (A \oplus B \oplus C \oplus D))$$
 (2)

$$Sum = A \bigoplus B \bigoplus C \bigoplus D \bigoplus Cin$$
 (3)

## III. DESIGN OF PROPOSED APPROXIMATE COMPRESSORS

We have proposed two designs of 4-2 single column approximate compressor. These proposed designs of the compressor provide better performances in consumption of power, time and area than the accurate compressor.

#### Design 1

The outputs of the 4-2 compressor's logical expression We have proposed two designs of 4-2 single column approximate compressor. These proposed designs of the compressor provide better performances in consumption of power, time and area than the accurate compressor are reduced in part of approximation.

The expressions of the outputs for the given inputs x4, x3, x2, x1, Cin are:

$$Cout`=x3 \tag{4}$$

$$Carry = x4$$
 (5)

$$Sum = x4 \oplus x3 \oplus x2 \oplus x1 \oplus Cin$$
 (6)

This compressor has been designed in Xilinx software using Verilog HDL language. And was run and had been used in the first three multipliers of the six with the accurate multiplier, half and full adders.

#### Design 2

This is a slight improvised version of Design 1. Since deign 1 had Error Distance (ED) that was 2. To bring down the Error Distance(ED) to 1 the sum of the design 1 is modified.

The final expression of all the outputs of Design 2 is:

$$Cout'=x3 \tag{7}$$

$$Carry = x4$$
 (8)

 $Sum = Cin \cdot x4' \cdot x3' \cdot (x1 \oplus x2)|x4' \cdot x2 \cdot x1 \cdot (Cin \cdot x3 \mid Cin' \cdot x3')|(Cin' \cdot x2' \cdot x1' \cdot (x3 \oplus x4))|x4 \cdot x3 \cdot x2' \cdot (Cin \oplus x1)|x4 \cdot x2 \cdot (Cin \cdot x3')|x4 \cdot x3 \cdot x1'|$ 

 $)?\sim(x1\bigoplus x2\bigoplus x3\bigoplus x4\bigoplus Cin):(x1\bigoplus x2\bigoplus x3\bigoplus x4\bigoplus Cin) \quad (9)$ 

This compressor has been designed in Xilinx software using Verilog HDL language, and was run and had been used in the last three multipliers of the six with the accurate multiplier, half and full adders.

#### IV. DESIGN OF 8X8 MULTIPLIER



Fig. 2 Multiplier Reduction Stage

In the above picture, the rectangular boxes shown are where the 4-2 compressors are used. In accurate multiplier all the 4-2 compressors used are accurate or conventional. In the six designs of the approximate multipliers the accurate compressors and approximate compressors are used in a ratio. In the first three designs of the approximate multipliers, the approximate compressors used are of design1. And the next three designs of the approximate multipliers, the approximate compressors used are of design 2. The reason for this ratio is to keep a check on the **NED** and **ERROR RATE(ER)** and also as the **NED** increases, the power and delay decreases. Therefore, all the six multipliers are different in performances. The objective is to develop multipliers useful for specific applications.

#### V. APPROXIMATE MULTIPLIERS

We have proposed six designs of approximate multipliers using the two new designs of the 4-2 compressors. Instead of replacing all the accurate compressors with approximate compressors, we have replaced only some of the accurate compressors with approximate compressors. We have three different ratios of approximate to accurate compressors in a multiplier. In the first three multipliers we have used design1 and in the last three multipliers we have used design2 of the proposed approximate multiplier.

- In first case (Multiplier 1) all the 4-2 compressor are used in Design 1
- In second case (Multiplier 2) Approximated Design 1 compressors are used for 8 columns which fall in the partial products reduction stage and accurate compressors are used for 10 columns of the final carry generation stage.
- In third case (Multiplier 3) Approximated Design 1 for 6 columns, the LSB columns of the partial products reduction stage and accurate compressors for 12 columns; 2 MSB columns of the partial products reduction stage and 10 columns of the final carry generation stage.
- In fourth case (Multiplier 4) Approximated Design 2 for all columns.
- In fifth case (Multiplier 5) Approximated Design 2 Compressors are used for 8 columns which fall in the partial products reduction stage and accurate compressors are used for 10 columns of the final carry generation stage.
- In six case (Multiplier 6) Approximated Design 2 for 6 columns, the LSB columns of the partial products reduction stage and accurate compressors for 12 columns; 2 MSB columns of the partial products reduction stage and 10 columns of the final carry generation stage.

#### VI. APPLICATION

To validate and prove that our proposed approximate multipliers are efficient and are as equally effective as the accurate one, we have considered two MRI images of a



brain with tumor. One is an ideal MRI image without noise (Fig 3) and the other with noise (Fig 4). Edge detection is done on these two images using accurate and approximate multipliers. Usually the operators used in edge detection are A. Laplacian, Perwitt, Canny and Sobel. In this paper, we have used Laplacian and Sobel operators for carrying out edge detection on the images. The Sobel operator is used on the MRI image without noise since it involves first degree differentiation. The noisy image needs a stronger operator than the Sobel operator, so Laplacian operator is used which involves second degree differentiation.



Fig. 3 Ideal MRI Image of a brain with tumor without noise



Fig. 4 MRI image of a brain with tumor with noise

#### **Edge Detection with Sobel**

Edge Magnitude = 
$$\sqrt{S_1^2 + S_2^2}$$

#### Fig. 5 Sobel gradient and operators

The image shown above is the Sobel operator. S1 is the x-axis mask and S2 is the y-axis mask and the edge detection is done by first moving S1 in x-axis and S2 in y-axis and then computing edge magnitude by the above formula.

The multipliers are used in the processes of masking i.e. when the operators x and y operators are moved along the

image in different directions. This Sobel operator is used for ideal images.

#### **Edge Detection with Laplacian**



### The laplacian operator (include diagonals)

Fig. 6 Laplacian operator

The edge detection done by the Laplacian is also the same except for the two different operators for different directions. Masking is done by this operator all over the image.

The image processing is done in MATLAB 2013b separately for all the 6 multipliers and the accurate one too. And the images generated by the approximate multipliers are compared with the image generated by accurate multiplier and the physical quantity used for the clarity of the image in comparison with the accurate one is PSNR value.

#### VII. RESULTS

The objective of this project is to make an efficient multiplier which has good power efficiency and is faster. Approximate computing has been done on 8x8 multipliers with two designs of 4-2 approximate compressors and their area, power and delay analysis have been done.

The analysis of designs of compressors gave us the following results:

- Design 1 compressor occupies 24.44% lesser Cell Area than the Accurate Compressor. It consumes Power that is 54.35% lesser than that of Accurate Compressor. At the same time it's Delay is 3% slower than the Accurate Compressor.
- Design 2 compressor consumes 57% lesser Power than what accurate compressor consumes. It is 6% faster than accurate compressor in Delay. At the same time it occupies 4.44% more Cell Area than the accurate compressor.

The analysis of multiplier gave us the following results:

- Multiplier 1 occupies **14%** less area, consumes **23.5%** lesser power, and **3.6%** faster than Accurate Multiplier
- Multiplier 2 occupies **6%** less area, consumes **9.2%** lesser power, and **3.6%** faster than Accurate Multiplier



#### Design of Efficient Approximate Compressor for Digital Image Processing

- Multiplier 3 occupies 4.3% less area, consumes 8.3% lesser power, and 3.6% faster than Accurate Multiplier
- Multiplier 4 occupies 10.5% less area, consumes 48.5% lesser power, and 4% faster than Accurate Multiplier
- Multiplier 5 occupies 4.3% less area, consumes 24.7% lesser power, and 0.4% faster than Accurate Multiplier.
- Multiplier 6 occupies 2.3% less area, consumes 18.64% lesser power, and 3.6% faster than Accurate Multiplier

In the medical image processing application too, the multipliers have shown very convincing results. The PSNRs of many images processed by multipliers have been more than 30 which is perceived as a good value. Whereas the PSNR values given by the existing popular approximate multipliers [8] are less than 30dB.

The images operated with the operators using the approximate multipliers are compared with the accurate ones. The images and the tables of comparison of both Sobel and Laplacian operators using approximate multipliers and accurate multiplier are displayed below.

#### **Image Results**



Fig. 7 Results of Sobel operator on the ideal MRI image without noise using (a)Accurate Multiplier (b)
Multiplier1 (c) Multiplier2 (d) Multiplier3 (e)
Multiplier4 (f) Multiplier5 (g) Multiplier6



Fig. 8 Results of Laplacian operator on the MRI image with noise using (a) Accurate Multiplier (b) Multiplier1 (c) Multiplier2 (d) Multiplier3 (e) Multiplier4 (f) Multiplier5 (g) Multiplier6

A. Table of Comparison of PSNR's of the Result

Table. 1 Comparison of PSNRs of Sobel Operator using Approximate Multipliers and accurate multiplier

| Multipliers  | PSNR    |
|--------------|---------|
| Accurate     | -       |
| Multiplier 1 | 23.31   |
| Multiplier 2 | 24.39   |
| Multiplier 3 | 32.054  |
| Multiplier 4 | 23.31   |
| Multiplier 5 | 24.3925 |
| Multiplier 6 | 32.054  |

Table.2 Comparison of PSNR of Approximate Laplacian Operator Multipliers with the accurate multiplier.

| 1            |         |
|--------------|---------|
| Multipliers  | PSNR    |
| Accurate     | -       |
| Multiplier 1 | 23.31   |
| Multiplier 2 | 24.39   |
| Multiplier 3 | 32.054  |
| Multiplier 4 | 23.31   |
| Multiplier 5 | 24.3925 |
| Multiplier 6 | 32.054  |

#### VIII. CONCLUSION

A new 4-2 compressor was designed and used in a multiplier. Using two new designs of 4-2 compressors, six multipliers were designed. The compressors and the multipliers were compiled with the Cadence RTL compiler and typical 90 nm library file was used to get the simulation results. Image processing application was used to validate the work and results. Edge detection technique was applied



on the images and the results were compared. The results of the approximate multipliers were found out to be as efficient as that of the accurate multiplier. The PSNR of the proposed designs was observed to be more than 30dB.

#### ACKNOWLEDGMENT

The preferred spelling of the word "acknowledgment" in America is without an "e" after the "g". Avoid the stilted expression "one of us (R. B. G.) thanks ...". Instead, try "R. B. G. thanks...". Put sponsor acknowledgments in the unnumbered footnote on the first page.

#### REFERENCES

- S Sakthikumaran, S. Salivahanan, and V. S. Kanchana Bhaaskaran, "16-Bit RISC Processor Design for Convolution Application", in proc. of international conference on Recent Trends in Information Technology (ICRTIT), Chennai, India, June 2011, pp. 394-397.
- Donald Donglong Chen, Nele Mentens, Frederik Vercauteren, Sujoy Sinha Roy, Ray C. C. Cheung, Derek Pao, and Ingrid Verbauwhede, "High-Speed Polynomial Multiplication Architecture for Ring-LWE and SHE Cryptosystems", IEEE Transactions on Circuits and Systems I: Regular Papers, vol, 62, no.1, pp. 157-166, Jan. 2015.
- Reza Azarderakhsh, and Arash Reyhani-Masoleh, "Parallel and High-Speed Computations of Elliptic Curve Cryptography Using Hybrid-Double Multipliers", IEEE Transactions on Parallel and Distributed Systems, vol. 26, no, 6, pp. 1668-1677, June 2015.
- P.C.H. Meier, R.A. Rutenbar, and L.R. Carley, "Exploring multiplier architecture and layout for low power", in proc. of IEEE Custom Integrated Conference, San Diego, CA, USA, May.1996, pp. 513-516.
- J. Gu and C. H. Chang, "Ultra low-voltage, low-power 4-2 compressor for high speed multiplications," in Proc. 36th IEEE Int. Symp. Circuits Syst., Bangkok, Thailand, May 2003, pp. 321-324.
- M. Margala and N. G. Durdle, "Low-power low-voltage 4-2 compressors for VLSI Applications," in Proc. IEEE Alessandro Volta Memorial Workshop Low-Power Design, March 1999, pp. 84–90.
- K. Prasad and K. K. Parhi, "Low-power 4-2 and 5-2 compressors," in Proc. 35th Asilomar Conf. Signals, Syst. Comput., 2001, vol. 1, pp. 129–133.
- Amir Momeni, Jie Han, Paolo Montuschi, and Fabrizio Lombardi, "Design and Analysis of Approximate Compressors for Multiplication" IEEE Transactions on Computers in 2014.
- Wong Seng Yue, "Application of Energy Conservation Techniques in Industries and Institution", International Innovative Research Journal of Engineering and Technology, Vol. 4, No. 2, p. 7-16, Dec 2018.

