# Low Power and High Performance FIR Filter For Reconfigurable Applications

## Nishant Yadav, Aarthy M

Abstract: Reconfigurability is the utmost requirement for Finite Impulse Response (FIR) filters that are used in many applications related to digital signal processing. The transposed form of filter often support multiplication techniques and they are pipelined which enhances the performance of the filter and also low power is often achieved.

In this work, an LUT reduction technique is used for the blocked FIR filter, which greatly reduces the power consumed by the filter and also enhances the speed by reducing the delay of the circuit. A comparison is made for the proposed filter design with the existing blocked FIR filters presented over the years. The proposed design offers less Energy per sample (EPS) and Area delay product (ADP) when compared to the reconfigurable architectures for large filters. From the synthesis results, it can be seen that for a filter length of 16, the proposed design offers 34.6 % reduction in ADP and 63 % reduction in EPS.

Index Terms - ADP, EPS, FIR, Reconfigurability.

#### I. INTRODUCTION

Signal processing and communication systems are in constant need of FIR filters, which serves a major role in these fields. The signal processing areas which use FIR Filters are speech processing, channel equalization, echo and noise cancellation and many others. All these applications mainly uses filtering, i.e. removing the frequencies that are not needed for the operation. Often these applications require high order FIR filters which have high sampling rate. Over the years, many architectures have been proposed to increase the performance parameters of the filter and also to reduce the filter complexity. Generally, for these applications, the filter coefficients remains unchanged. This property of the filter can be utilized to lower the total number of multiplications occurring in a filter design which will ultimately reduce the complexity of the design. However, with the increasing order of the filter, the addition and multiplications required for the filter design also increases linearly.

Many works related to filter design with fixed coefficients have been carried out in the recent years. These designs either use Distributive Arithmetic (DA) [2] or Multiple Constant Multiplications (MCM). [3] The designs used in DA method comprises of an LUT which stores the precomputed multiplications which reduces the overall complexity of the design. On the other hand, the MCM technique minimizes the additions which are needed for the multiplication part. The MCM technique is highly effective when a number of constants are multiplied by a common operand and hence it is

#### Revised Manuscript Received on May 06, 2019

**Nishant Yadav**, Department Of Micro & Nano Electronics SENSE, Vellore Institute of Technology, Vellore -632014, Tamil Nadu, India

**Aarthy M**, Department Of Micro & Nano Electronics SENSE, Vellore Institute of Technology, Vellore -632014, Tamil Nadu, India

apt for the high order filter designs which are having fixed coefficients.

For some DSP applications, the FIR filters are needed to be designed in a configuration that can support wireless communications. In recent years, many works have been proposed for the efficient design of the reconfigurable architecture by the use of multipliers and constant multiplication techniques. [3] – [9]. The proposed designs in [3], [4], [5] and [6] are not suitable for high order filters and are not power-delay efficient. The architecture in [7] uses constant shift method (CSM) and DA based architecture has been proposed recently in [8] and [9] for a Reconfigurable FIR Filter (RFIR). The filter designs used in [7] and [9] are transposed form and direct form structure respectively. But there are not much work has been done on blocked FIR filter. An LUT used in a filter design is generally used to store the coefficients or the precomputed multiplication of the coefficient and the input. An LUT based multiplier can be made by simply storing the precomputed results from the filter design. Recently, many architectures have been proposed for the various DSP algorithms which are based on the memory optimization. [10] - [16]. But very less work has been done on the LUT optimization for multiplication. The Odd Multiple Storage (OMS) Technique which is used in [17] uses only odd multiples of the coefficients to be stored in LUT instead of storing all the coefficients at a time. On the other hand, The Antisymmetric Product Coding technique used in [18] reduces the size of the LUT to half as it stores the product in the form of antisymmetric pairs.

Although, the APC method lowers the size of LUT by half but it increases the time of operation of the design hence making it slower than other designs. Similarly, the OMS technique when implemented alone incorporates overhead of power, although it also reduces the size of LUT by half. The APC and OMS technique when combined greatly reduces the power overhead and speed up the design. In this work, we are going to incorporate the combined APC-OMS technique for the blocked FIR filter design in transposed form.

## II. PROPOSED RFIR FILTER ARCHITECTURE

The proposed FIR filter architecture is designed using the combined APC-OMS technique in the transposed form .The architecture for filter length 16 is presented in fig 1. It consists of an LUT, a barrel shifter, an adder circuit unit and a parallel adder unit. The LUT basically stores the product words which are multiplication of the fixed coefficient B with the input word x[n]. i.e. x[n]\*B. The product words are computed beforehand and are updated in the LUT.





Fig 1.Proposed FIR Filter Architecture for filter length N=16.

The product word then goes to the Barrel Shifter. A barrel shifter shifts the input bits to the left or right according to the requirement. Here the barrel shifter shifts the product word bits to the left to obtain even multiples. The number of shifts is determined by the control signals s0 and s1. The Barrel shifter's output is now b[n].

b[n] then goes to the Adder Unit where the shifted product word is added to 8B so as to get the other product words. So the output of the Adder Unit is 8B + b[n]. Finally, all these product word gets added in the Pipelined Adder Unit to yield the filter output y[n]. The pipelined adder unit is constructed according to the transpose form of the FIR filter. The product words are added and delays are introduced in between.

## A. APC technique for LUT reduction.

For our convenience, let the input word X and the coefficient B both be positive integers. Table I shows that the input X in the third column is the 2's compliment of the input in the first column. Also it can be observed that the sum of the product values of the same row is equal to 16B.

Table I. APC words for inputs for L=4.

| Input X | Product | Input X | Product | Address     | APC   |
|---------|---------|---------|---------|-------------|-------|
|         | Values  | _       | Values  | X3 X2 X1 X0 | Words |
| 0001    | В       | 1111    | 15B     | 1 1 1 1     | 15B   |
| 0010    | 2B      | 1110    | 14B     | 1 1 1 0     | 14B   |
| 0011    | 3B      | 1101    | 13B     | 1 1 0 1     | 13B   |
| 0100    | 4B      | 1100    | 12B     | 1 1 0 0     | 12B   |
| 0101    | 5B      | 1011    | 11B     | 1 0 1 1     | 11B   |
| 0110    | 6B      | 1010    | 10B     | 1 0 1 0     | 10B   |
| 0111    | 7B      | 1001    | 9B      | 1 0 0 1     | 9B    |
| 1000    | 8B      | 1000    | 8B      | 1 0 0 0     | 8B    |

Let us suppose if we consider the product value in  $2^{nd}$  column to be **m** and the product value in fourth column to be **n**. Therefore, we can write m = [(m+n)/2 - (m-n)/2] and n = [(m+n)/2 + (m-n)/2], for (m+n) = 16B. So, we have  $m = 8B - \frac{n-m}{2}$  and  $n = 8B + \frac{n-m}{2}$ . This property can be exploited to reduce the complexity of the LUT, where instead of storing all the possible product words **m** and **n**, we can

store only [(n-m)/2] product words. So, we can say that second and fourth column from the table are antisymmetric pairs of each other. By adding or subtracting the stored product words from 8B (fixed value), we can get the desired output.

# B. OMS technique for LUT reduction

Any product value of input X and a fixed coefficient B can be expressed as Y=X.B Suppose the input X is of bit length  $\mathbf{k}$ , so a total of  $2^k$  possible words of Y=X.B can be stored in the LUT. But instead we can store half of the words which are odd multiples of B and rest of the words i.e. the even multiples can be obtained by simply left shifting the odd multiples. Therefore for implementing OMS technique a memory unit is required which can store  $(2^k/2)+1$  words out of which  $(2^k/2)$  words will be the odd multiples with the last word as 0. Also a barrel shifter is required which can left shift the odd multiples to obtain the even product words. The shifter can shift up to k-1 bits.

From the Table II, we can see that we require to store only the words B, 3B, 5B and 7B. From B, left shifting the bits, we can obtain 2B and 4B. Similarly, 6B and 12B can be obtained by left shifting 3B and 10B and 14B by 5B and 7B respectively. A maximum of 2 shifts is required to successfully obtain all the product words.

Table II. OMS technique to store product words in LUT

| Product<br>Values | No. of<br>shifts | Shifted Input, | Stored APC<br>Word | Address<br>d <sub>2</sub> d <sub>1</sub> d <sub>0</sub> |
|-------------------|------------------|----------------|--------------------|---------------------------------------------------------|
| B                 | 0                | 001            | P <sub>0</sub> = B | 000                                                     |
| 2 X B             | 1                | 001            | F0-D               | 000                                                     |
|                   | 1                |                |                    |                                                         |
| 4 X B             | 2                |                |                    |                                                         |
| 8 X B             | 3                |                |                    |                                                         |
| 3B                | 0                | 011            | $P_1 = 3B$         | 001                                                     |
| 2 X 3B            | 1                |                |                    |                                                         |
| 4 X 3B            | 2                |                |                    |                                                         |
| 5 <b>B</b>        | 0                | 101            | $P_2 = 5B$         | 010                                                     |
| 2 X 5B            | 1                |                |                    |                                                         |
| 7B                | 0                | 111            | $P_3 = 7B$         | 011                                                     |
| 2 X B             | 1                |                |                    |                                                         |

Finally, all the product words thus obtained by left shifting the odd multiples are added to fixed value 8B to obtain other values i.e.

8B + 2B = 10B

8B + 4B = 12B

8B + 6B = 14B

8B + 5B = 13B

8B + 7B = 15B

8B + 8B = 16B

8B + B = 9B

8B + 3B = 11B

8B + 0 = 8B

#### III. SIMULATIONS AND RESULTS

The proposed RFIR filter is coded using Verilog HDL and

is synthesized using Synopsys Design Compiler Tool with TSMC 90nm



library files for ASIC Implementation. The MCM based RFIR filter used in [1] is also coded and is synthesized for comparison purpose. The Area, Power, Area delay product (ADP) and Energy per sample (EPS) has been compared with other existing architectures to have a better performance comparison.

# A. Simulation Results For FIR Filter in [1].



Fig 2. Simulation Result for RFIR Filter in [1].



Fig 3. RTL Schematic for RFIR Filter in [1].

# B. Simulation Results For Proposed FIR Filter.



Fig 4. Simulation Result for proposed FIR filter.



Fig 4. RTL Schematic for proposed FIR filter.

#### C. Synthesis Results.

The area, power and delay are calculated using Synopsys Design Compiler and are presented in a tabular form for comparison purpose. The proposed architecture has been compared in terms of performance parameters with the other existing FIR Filter architectures. As can be seen from the table III, the proposed architecture acquire more area than the architecture proposed in [1], but it consumes less power and has a less delay. The proposed FIR Filter architecture has a better ADP and EPS when compared to all other architectures.

Table III. Performance Comparison for various parameters for filter length N=16.



# Low Power And High Performance FIR Filter For Reconfigurable Applications

| Sr. | Filter Architectures | Area      | Delay  | ADP    | EPS   |
|-----|----------------------|-----------|--------|--------|-------|
| No  |                      |           |        |        |       |
| 1.  | Proposed FIR Filter  | 20695.021 | 1.2563 | 25999  | 1.511 |
| 2.  | RFIR Filter [1]      | 8631.853  | 1.3464 | 11621  | 4.086 |
| 3.  | Structure of [19]    | 71195     | 1.78   | 127439 | 60.76 |
| 4.  | Structure of [20]    | 51994     | 3.31   | 172620 | 13.63 |
| 5.  | Structure of [9]     | 25163     | 1.57   | 39757  | 13.03 |

# IV. CONCLUSION

In this work, we have presented an RFIR Filter architecture in transpose form targeting the LUT and multiplier part. The proposed architecture consumes less power and has a higher speed when compared with other FIR filters. From the comparison table, it is observed that although the proposed Filter has larger area than other filters, but it consumes less power and also is better in terms of speed. Also it can be seen that the filter has less EPS and ADP when compared to the existing architectures. The proposed architecture has 34.6% less ADP and 63 % less EPS than the other RFIR filters.

#### REFERENCES

- B. K. Mohanty and P. K. Meher, "A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications" IEEE Tran on VLSI,Feb 2015.
- S. A. White, "Applications of distributed arithmetic to digital signal processing: A tutorial review," IEEE ASSP Mag., vol. 6, no. 3, pp. 4–19, Jul. 1989.
- J. Park, W. Jeong, H. Mahmoodi-Meimand, Y. Wang, H. Choo, and K. Roy, "Computation sharing programmable FIR filter for low-power and high-performance applications," IEEE J. Solid State Circuits, vol. 39, no. 2, pp. 348–357, Feb. 2004.
- NagaJyothi, Grande, and Sriadibhatla SriDevi. "Distributed arithmetic architectures for fir filters-a comparative review." 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). IEEE, 2017.
- K.-H. Chen and T.-D. Chiueh, "A low-power digit-based reconfigurable FIR filter," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 8, pp. 617–621, Aug. 2006.
- Jyothi, Grande Naga, and Sridevi Sriadibhatla. "Asic implementation of low power, area efficient adaptive fir filter using pipelined da." Microelectronics, Electromagnetics and Telecommunications. Springer, Singapore, 2019. 385-394.
- R. Mahesh and A. P. Vinod, "New reconfigurable architectures for implementing FIR filters with low complexity," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 29, no. 2, pp. 275–288. Feb. 2010.
- Grande, Naga Jyothi, and Sriadibhatla Sridevi. "Asic implementation of shared lut based distributed arithmetic in fir filter." 2017 International conference on Microelectronic Devices, Circuits and Systems (ICMDCS). IEEE, 2017.
- S. Y. Park and P. K. Meher, "Efficient FPGA and ASIC realizations of a DA-based reconfigurable FIR digital filter," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 61, no. 7, pp. 511–515, Jul. 2014.
- J.-I. Guo, C.-M. Liu, and C.-W. Jen, "The efficient memory-based VLSI array design for DFT and DCT," IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process, vol. 39, no. 10, pp. 723–733, Oct. 1992.
- H.-R. Lee, C.-W. Jen, and C.-M. Liu, "On the design automation of the memory-based VLSI architectures for FIR filters," IEEE Trans. Consum. Electron., vol. 39, no. 3, pp. 619–629, Aug. 1993.

- D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, "A systolic array architecture for the discrete sine transform," IEEE Trans. Signal Process., vol. 50, no. 9, pp. 2347–2354, Sep. 2002.
- H.-C. Chen, J.-I. Guo, T.-S. Chang, and C.-W. Jen, "A memory-efficient realization of cyclic convolution and its application to discrete cosine transform," IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 3, pp. 445–453, Mar. 2005.
- D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, "Systolic algorithms and a memory-based design approach for a unified architecture for the computation of DCT/DST/IDCT/IDST," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 6, pp. 1125–1137, Jun. 2005.
- P. K. Meher, "Systolic designs for DCT using a low-complexity concurrent convolutional formulation," IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 9, pp. 1041–1050, Sep. 2006.
- P. K. Meher, "Memory-based hardware for resource-constrained digital signal processing systems," in Proc. 6th Int. Conf. ICICS, Dec. 2007, pp. 1–4.
- P. K. Meher, "New approach to LUT implementation and accumulation for memory-based multiplication," in Proc. IEEE ISCAS, May 2009, pp. 453–456.
- 18. P. K. Meher, "New look-up-table optimizations for memory-based multiplication," in Proc. ISIC, Dec. 2009, pp. 663–666.
- P. K. Meher, "Hardware-efficient systolization of DA-based calculation of finite digital convolution," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 8, pp. 707–711, Aug. 2006.
- 20. DesignWare Building Block IP User Guide, Synposys, Inc., Mountain View, CA, USA, 2012, 06-SP2.

### **AUTHORS PROFILE**



Nishant Yadav is currently doing his M.Tech in VLSI design (2017-19) from Vellore Institute of Technology, Vellore. He has done projects on memory design, low power design, and FIR Filters. The author has completed his B.E in Electronics and Telecommunications from Rashtrashant Tukadoji Maharaj Nagpur University, Nagpur in 2015. Currently he is doing his master thesis under

the guidance of his mentor and co-author Aarthy M in RFIR Filters.



**Aarthy M** is currently working as Assistant Professor at VIT University, Vellore. She has completed her M.E. (VLSI design) and B.E. (Electronics and Communications) from Anna University, Chennai in 2012 and 2010 respectively. Her research interests include Digital IC design, Analog IC design, low power design Nano IC design.

