Low Power Oriented Full Search Block Based Motion Estimation (LP-FSBME) Architecture Using Power Efficient Adder Compressor For H.265 Coding Techniques.

B.Hemamalini, R.Ramadhurai, S.Mahaboob Basha

Abstract: H.265 coding is known as HIGH efficiency video coding (HEVC). This is most successful video compression standard and extended from H.264/MPEG-4 advanced video coding (AVC) for same level of video quality. However, H.265 improved better video quality for same bit rate. In video coding, motion estimation (ME) is determined the motion vector from adjacent frames. Various algorithms have been introduced by many researchers to accomplish low power oriented ME. However, low power oriented full search block based motion estimation (LP-FSBME) algorithm gives accurate results. Architecture of sum of absolute difference (SAD) is used an adder tree to accumulate the processing elements. Power efficient 16:2 adder compressors in SAD architecture reduce the power dissipation rather than convention adders in SAD architecture. The hardware implementation of proposed method is done in Xilinx Virtex 7 FPGA XC7VX1140T device with speed grade 1 in Xilinx software version 14.5 tool, developed in Verilog Hardware Description Language (Verilog-HDL), and simulated in ISE simulator for tennis, BQ terrace and Kimono videos with the resolution of 1080x720 pixels with 30fps.

Keywords: ADDER COMPRESSOR, Xilinx Virtex 7 FPGA, VERILOG.

I. INTRODUCTION

International Telecommunication Union (ITU-T) Video Coding Experts Group and International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group (MPEG) are developed High Efficiency Video Coding (HEVC) in 2013. HEVC is also known as H.265. HEVC is the video compression technique, which compresses video such as half size of the advanced video coding (AVC) technique, gives higher quality than AVC as well. The HEVC compression efficiency is more and comes from improved well partitioned structures and well defined prediction modes, enhanced entropy coding algorithms. But the major limitation from this well-defined algorithms [1-4] that they need a number of computations to be defined as well as executed, which consumed more power dissipation and energy. This is major drawback while smart mobiles, tablets and smart electronic gadgets are using. These all are battery powered devices. So, less power dissipation and less energy consumed hardware designs of HEVC video codecs should make better for battery resources.

Motion estimation is exploited the video sequences abundant in temporal redundancy due to the high frame rates in video encoding. Integer Motion Estimation (IME) searches earlier encoded frames and located memory for encoding the current frame. Fractional Motion Estimation (FME) searches interpolated with current and reference pixels. A motion vector (MV) can be generated when determined the best match. Sum of absolute difference (SAD) is used in IME for determine the best match. SAD process is consuming more time for HEVC encoder. To overcome this issue, the power efficient adder compressor is proposed for low power oriented full search block based motion estimation (LP-FSBME) architecture. SAD architecture consists of subtraction, absolute operation, adder tree process and accumulation to be done to get final SAD value. An adder compressor can be used instead of adder tree to reduce the delay and power than conventional adder tree techniques [5].

Section 2 describes the literature survey of various adder techniques advantages and disadvantages, section 3 defines the basic low power oriented full search block based motion estimation (LP-FSBME) algorithm, section 4 proposed the power efficient adder compressor technique in low power oriented full search block based motion estimation (LP-FSBME) algorithm, section 5 discussed results and compared proposed method with conventional methods. Finally, section 6 concludes the paper.

II. LITERATURE SURVEY

In digital system, Adder operation is an essential for arithmetic operations. Many of the adder circuit designs have been introduced [6]. Most successful adder is Ripple carry adder (RCA) for N-bit numbers. RCA can be cascaded of full adder to add two N-bit numbers, the carry out bit to next full adder as carry input. RCA takes more time due to this cascading. To overcome this delay issue, S. Knowles [7] proposed Carry Look-ahead...
Adder (CLA). CLA can predict the advance carry signal, this helps to enhance the speed by using the generating and propagating signal for every bit. Therefore, the carry bit can be computed before the sum computation which reduces the carry waiting time.

P. M. Kogge and H. S. Stone [8] proposed a parallel prefix adder, known as Kogge-Stone adder (KSA). KSA is faster than RCA and CLA because of less logic depth and less fanout. However, the major drawback is complexity in wiring and occupied large area than CLA. To overcome wiring complexity in KSA, R. E. Ladner and M.J. Fischer [9] proposed Ladner-Fischer parallel prefix adder (LF-PPA). However, LF-PPA sustains the minimum logic levels than KSA, this costs fan-out problem more.

To solve the issues in LF-PPA and KSA, R. P. Brent and H. T. kung [10] proposed BK-PPA. But speed of the adder is due to the large number of levels. Therefore, it occupied less area with more delay than KSA.

As computation complexity in parallel prefix adder, more logic designs, Adder Compressor proposed parallel carry computation with the compression of bits to improve the speed and to reduce the energy. The major advantage of this proposed method that less dependency of bits over delay, power and energy.

III. LOW POWER ORIENTED FULL SEARCH BLOCK BASED MOTION ESTIMATION (LP-FSBME)

The figure 1 defines the best match technique and minimum SAD of LP-FSBME in the search range from (-16, -16) to (15, 15), and the current pixel frame and reference pixel frame memory consists of 4-bank memory.

![Fig. 1. Low Power Oriented Full Search Block Motion Estimation Algorithm](image)

The block diagram of LP-FSBME consists of $\times 4 \times 4$ SADs, $\Rightarrow \Leftarrow \Rightarrow$ shift registers, minimum SAD detector, best motion vector (MV), is shown in figure 2.

![Fig. 2. Block Diagram of LP-FSBME](image)

The block diagram of LP-FSBME consists of $\times 4 \times 4$ SADs, $\Rightarrow \Leftarrow \Rightarrow$ shift registers, minimum SAD detector, best motion vector (MV), is shown in figure 2. Figure 3 defines the SAD architecture, which consists of $\times \times 4$ processing elements (PEs), and adder. Minimum SAD detector detects minimum SAD and MV.

![Fig. 3. Architecture of SAD](image)

IV. PROPOSED METHOD

The block diagram of the Low power efficient adder compressor of SAD architecture is shown in figure 4.

![Fig. 4. Low power efficient adder compressor of SAD architecture](image)
3:2 adder. The architecture of the 16:2 adder is shown in figure 5.

The 7:2 adder contains primary 7-bit inputs, 2-bit outputs along with the secondary 2-bit inputs and 2-bit outputs. Secondary input and outputs are input and output carries of the 7:2 adder, shown in figure 6.

The 5:2 adder contains 5-bit inputs, 2-bit output along with the secondary internal input and output carries, shown in figure 7.

The 3:2 adder contains 3-bit inputs, 2-bit outputs, shown in figure 9. The major advantage of proposed adder compressor is that it has reduced critical path than other conventional adders.

V. RESULTS AND DISCUSSIONS

In this section, SAD architectures have analyzed the throughput of ME for fast execution. H.265 encoder software is used to describe the synthesis frequency of SAD unit, considered three types of video frames with the resolution of 1080x720 pixels with 30fps. The hardware implementation of proposed method is done in Xilinx Virtex 7 FPGA XC7VX1140T device with speed grade 1 in Xilinx software version 14.5 tool, developed in Verilog Hardware Description Language (Verilog-HDL), and simulated in ISE simulator for tennis, BQ terrace and Kimono videos. The proposed power efficient adder compressor is implemented in LP-FSBME and compared with the conventional algorithms, such as full search block matching algorithm (FSBME) [2], MMEA [15][16] and RCSEA [6], LP-FSBME [17]. In synthesis, the proposed method is area efficient and less delay than conventional techniques. By using the Xilinx Xpower Analyzer tool, the proposed method is computed and compared with the conventional methods, shown in table-I. Table-I compared the proposed method with the conventional methods in gate counts, lookup tables (LUTs), Registers, Frequencies (MHz), and power consumption. Figure 10 describes the gate count comparison of conventional methods with proposed method.
Low Power Oriented Full Search Block Based Motion Estimation (LP-FSBME) Architecture Using Power Efficient Adder Compressor For H.265 Coding Techniques.

Figure 11 describes the LUTs comparison of conventional methods with proposed method. Figure 12 describes the register count comparison of conventional methods with proposed method. Figure 13 describes the maximum frequency comparison of conventional methods with proposed method. Figure 14 describes the Power consumption comparison of conventional methods with proposed method.

Table- I: Hardware Cost Comparison

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Video Spec</td>
<td>1080P@30fps</td>
<td>1080P@30fps</td>
<td>1080P@30fps</td>
<td>1080P@30fps</td>
<td>1080P@30fps</td>
</tr>
<tr>
<td>Gate Count</td>
<td>950K</td>
<td>930K</td>
<td>104K</td>
<td>754K</td>
<td>717K</td>
</tr>
<tr>
<td>Look Up Tables</td>
<td>6700</td>
<td>6150</td>
<td>5200</td>
<td>4410</td>
<td>3901</td>
</tr>
<tr>
<td>Registers</td>
<td>4941</td>
<td>4447</td>
<td>3894</td>
<td>3312</td>
<td>3221</td>
</tr>
<tr>
<td>Max Frequency (MHz)</td>
<td>250</td>
<td>220</td>
<td>142</td>
<td>194</td>
<td>278</td>
</tr>
<tr>
<td>Power consumption (mW)</td>
<td>48.67</td>
<td>123.44</td>
<td>280</td>
<td>40.14</td>
<td>35.78</td>
</tr>
</tbody>
</table>

VI. CONCLUSION

Power efficient adder compression technique is used in of low power full search block motion estimation (LP-FSBME). The major advantage of the power efficient adder compression technique is reduced the critical path while adding carry bit in the consequent adder. This step made area efficient and less delay and more power efficient by reducing the adder bits. The proposed method is implemented in virtex 7 FPGA XC7VX1140T device.
with speed grade 1 in Xilinx software version 14.5 hardware design tool, designed in Verilog Hardware Description Language (Verilog-HDL) and simulated by using the ISE simulator for tennis, BQ terrace and Kimono videos. The Xilinx power tool analyzer is helped to compute and compare the power analysis of the proposed and conventional methods. In future enhancement, new methods of residue number system (RNS) based adders will be developed for power efficient motion estimation applications.

REFERENCES


AUTHORS PROFILE

B. Hemamalini received B.E. in ECE from Velammal Engineering College(Affiliated to Anna University), Chennai, India in 2017. At present pursuing M.E. in AE in Gojan School of Business and Technology.

R. Ramadhurai received B.E. in ECE from A.V.C College of Engineering(Affiliated to Anna University), Myladuthura, India in 2005 and M.Tech- VLSI Design in Sathyabama University, Chennai, India in 2012. At present working as Assistant Professor in Gojan School Of Business and Technology, Chennai, India from 2006 to till date.

S. Mahaboob Basha received B.E. in ECE From Anjuman Engineering College, Karnatak University, India in 1998 and M.E., Applied Electronics in GCT, Combatore in 2003. At present pursuing Ph.D. in Anna University in the area of VLSI Design from 2010 to till date.