

# Implementation of 2D Hartley transform using Distributed Arithmetic

Sai Lakshmi Kumari N, U.Pradeep Kumar, K.V.Ramana Rao

Abstract: Discrete cosine transform (DCT) is usually used in JPEG based image transform coding. This paper presents separable 2-D discrete Hartley transform (SDHT) and its Distributed Arithmetic (DA) based hardware architecture as an alternate to DCT in transform based coding of image compression. The proposed DA architecture for 1-D DHT has very less computations as compared to existing 1-D DCT. The proposed DHT architecture implemented in FPGA indicates a significant hardware savings as compared to FPGA resources used in an efficient memory based DA approach. The additional advantage of SDHT is that its inverse transform is same as forward transform with a constant division. This is demonstrated through a Xilinx FPGA XC3s500E device.

Keywords: Distributed Arithmetic, Discrete Hartley Transform Discrete Cosine Transform, JPEG, Offset Binary Coding..

# I. INTRODUCTION

Digital image compression is the most important part in the multimedia applications which aims to reduce the number of bits in an image data for its efficient storage (less storage area). JPEG based still image compression follows three steps i.e. transform, quantization and coding to compress an image [1]. Reverse process decoding, dequantization and comprising transform is used for image de-compression [2,3]. Discrete cosine transform (DCT) is used to transform the image from spatial domain to frequency domain. During quantization less important frequencies are discarded and is termed as lossy image compression. Bracewell has drawn attention to the discrete Hartley transform (DHT) as a substitute for the discrete fourier transform (DFT) [4,5]. Many applications of DHT in signal processing and communications have been presented in the literatures [6-8]. DHT is used in JPEG based image compression with DHT replacing the DCT in [9]. PSNR comparable to DCT based image transform has been obtained. The advantage of using DHT over DCT is that a dequantizer is not required at the decoder side and hence facilitates saving of hardware resources. Hardware implementation of DHT requires a large number of multipliers. Since multiplier requires much more hardware as compared to adder and additionally consumes more power, it is not recommended in a battery running portable device [10,11]. Distributed Arithmetic (DA) is a digital signal

### Revised Manuscript Received on 30 October 2012

\*Correspondence Author(s)

Sailakshmi Kumari\*, Department of ECE, Pydah College of Engineering and Technology, Visakhapatnam, India.

U.Pradeep Kumar, Asst. Professor, Department of ECE, Pydah College of Engineering and Technology, Visakhapatnam, India.

© The Authors. Published by Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license <a href="http://creativecommons.org/licenses/by-nc-nd/4.0/">http://creativecommons.org/licenses/by-nc-nd/4.0/</a>

processing technique that implements multiplication without the use of multiplier. Two approaches have been emerged for DA. One is look-up table based [12] and other is without look-up table [13]. Field programmable gate array (FPGA) is an emerging technology in VLSI design because of its many advantages like reprogram ability, low cost, fast design cycle, in-circuit programmability etc [14-18]. In this paper separable 2-D DHT (SDHT) is proposed in place of DCT for image transform. The image transformed by SDHT is reconstructed after taking inverse SDHT (same as forward with a constant division). No error in the reconstructed image indicates that SDHT can be used for image transform [19-23]. SDHT is implemented in Xilinx FPGA XC3S500E device using memory based DA and proposed DA for DHT. Proposed DA for DHT requires very less hardware for its implementation as compared to memory based DA [24-30]. It also requires less computation as compared to DCT in [13].

#### II. SEPARABLE 2-D DHT

Separable 2-D DHT is given by the equation,

$$Y(u, v) = \sum_{x=0}^{N-1} \sum_{y=0}^{M-1} f(x, y) \cos\left(\frac{2\pi ux}{N}\right) \cos\left(\frac{2\pi vy}{M}\right)$$
 (1)

and the 1-D DHT is given by the equation,

$$Y(k) = \sum_{n=0}^{N-1} X(n) \cos\left(\frac{2\pi}{N}nk\right) k = 0,1,...N-1$$
 (2)

where  $H_{nk} = cas\left(\frac{2\pi}{N}nk\right)$ , is the transform's Kernel

X(n) is input data

The advantage of using separable 2-D DHT equation (1) is that it can be implemented with 1-D DHT given by equation (2) by row-column decomposition method. Non separable 2-D DHT equation requires further calculations after taking row-column decomposition [4].

# III. IMAGE TRANSFORM AND RECONSTRUCTION FROM SDHT

Fig.1 shows the JPEG baseline encoder which uses DCT based transform for image compression. Fig.2 shows the DHT based encoder. Fig.3a shows the image used for DHT transform by 8x8 block using SDHT. The reconstructed image after taking inverse SDHT as shown in Fig.3 (b) with almost no distortion proves that SDHT can be used in place of DCT for image transform. Energy calculation and thresholding is used in DHT based encoder instead of quantization table in DCT based encoder.

w.ijitee.org

Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP) © Copyright: All rights reserved.

Retrieval Number: E0272091512/12©BEIESP

# Implementation of 2D Hartley transform using Distributed Arithmetic



Fig.1 JPEG baseline encoder model

# DHT based Encoder model Energy calculation and Thresholding Transformed DHT Bx8 DHT Coefficients

Fig.2 DHT based encoder model with energy quantization.



Fig.3 (a) Original image (b) Reconstructed image after inverse SDHT

# LOOK-UP TABLE BASED DA FOR 1-D DHT

$$X_n = -x_{n,i-1} + \sum_{m=1}^{i-1} x_{n,i-1-m} 2^{-m}$$
 (3)

Negative of  $X_n$  is represented by

$$-X_n = -\bar{x}_{n,l-1} + \sum_{m=1}^{l-1} \bar{x}_{n,l-1-m} 2^{-m} + 2^{-(l-1)}$$
 (4)

$$\begin{split} X_n &= \frac{1}{2} \left[ X_n - (-X_n) \right] = \\ &\frac{1}{2} \left[ -(x_{n,i-1} - \overline{x}_{n,i-1}) + \sum_{m=1}^{i-1} (x_{n,i-1-m} - \overline{x}_{n,i-1-m}) 2^{-m} - 2^{-(i-1)} \right] \end{split}$$

Now from Equations (5) and (6),

$$X_n = \frac{1}{2} \left[ \sum_{m=0}^{l-1} d_{n,l-1-m} 2^{-m} - 2^{-(l-1)} \right]$$
 (7)

From Equations (2) and (7),

$$Y(k) = \sum_{n=0}^{N-1} \frac{1}{2} H_{nk} \left[ \sum_{m=0}^{l-1} d_{n,l-1-m} 2^{-m} - 2^{-(l-1)} \right]$$

$$= \sum_{m=0}^{l-1} \left( \sum_{n=0}^{N-1} \frac{1}{2} H_{nk} d_{n,l-1-m} \right) 2^{-m} - \left( \sum_{n=0}^{N-1} \frac{1}{2} H_{nk} \right) 2^{-(l-1)}$$
(8)

From equation (8), Y(k) can be computed for different values of k, let for example k=1 the summation

$$\sum_{n=0}^{N-1} \frac{1}{2} H_{n1} d_{n,i-1-m}$$

Retrieval Number: E0272091512/12©BEIESP Journal Website: <u>www.ijitee.org</u> can be stored in look-up table (ROM) after pre-additions as there can be a maximum of 2n possible combinations as shown in Table 1 for n=3. Inputs are transmitted in serial, 1 bit at a time. One bit of each input forms the address of memory location and they are accumulated after shifting left the previous result. Second accumulation term is a constant value and it is similar to the first row in table and hence there is no need of additional memory for it. Final output is obtained after i+1 clock cycle where i is number of bits in input (1 clock cycle for adding constant value). From table 1 it can be seen that upper half is same as lower half except sign is opposite. Hence look-up table can be halved as only upper half is stored and bits of X1 is X-ORed with all other inputs so that when it is 0 upper half is addressed and accumulated with no sign change but when it is 1 again upper half is addressed and accumulated with sign reversed.

TABLE 1

Pre-addition ROM content, for number of inputs n=3 and k=1

| $X_{1,m}$ | $X_{2,m}$ | $X_{3,m}$ | values to be stored in ROM      |
|-----------|-----------|-----------|---------------------------------|
| 0         | 0         | 0         | $-(H_{01} + H_{11} + H_{21})/2$ |
| 0         | 0         | 1         | $-(H_{01} + H_{11} - H_{21})/2$ |
| 0         | 1         | 0         | $-(H_{01}-H_{11}+H_{21})/2$     |
| 0         | 1         | 1         | $-(H_{01}-H_{11}-H_{21})/2$     |
| 1         | 0         | 0         | $(H_{01} - H_{11} - H_{21})/2$  |
| 1         | 0         | 1         | $(H_{01} - H_{11} + H_{21})/2$  |
| 1         | 1         | 0         | $(H_{01} + H_{11} - H_{21})/2$  |
| 1         | 1         | 1         | $(H_{01} + H_{11} + H_{21})/2$  |

# IV.PROPOSED DA FOR 1-D DHT

Considering the periodicity and symmetry of trigonometric functions equation (1) can be written as,

rigonometric functions equation (1) can be written as, 
$$y(0) = [x(0) + x(4) + x(1) + x(5) + x(2) + x(6) + x(3) + x(7)]A$$

$$y(1) = [x(1) - x(5)]C + [x(2) - x(6)]B + [x(0) - x(4)]A$$

$$y(2) = [x(2) + x(6)](-A) + [x(1) + x(5)]B + [x(3) + x(7)](-B)$$

$$+ [x(0) + x(4)]A$$

$$y(3) = [x(2) - x(6)](-B) + [x(3) - x(7)]C + [x(0) - x(4)]A$$

$$y(4) = [x(1) + x(5)](-A) + [x(2) + x(6)]A + [x(3) + x(7)](-A)$$

$$+ [x(0) + x(4)]A$$

$$y(5) = [x(1) - x(5)](-C) + [x(2) - x(6)]B + [x(0) - x(4)]A$$

$$y(6) = [x(0) + x(4)]A + [x(1) + x(5)](-B) + [x(2) + x(6)](-A) + [x(3) + x(7)]B$$

$$y(7) = [x(2) - x(6)](-B) + [x(3) - x(7)](-C) + [x(0) - x(4)]A$$

where.

$$A = cas(0)$$
,  $B = cas(\frac{\pi}{2})$ , and  $C = cas(\frac{\pi}{4})$ 

Representing in DA form as in [13] for DCT we get Adder/Subtractor matrix for DHT for all data as in Fig.4. Table 2 shows the explanation of Fig.3. Divide and multiply operations are done by shifting. Y

implies shifting n bits.

Negative sign in n implies left shift where as positive sign implies right shift. Positive sign in table 2 implies ALUs perform addition and negative sign implies subtraction operation. As an example, let's take the fourth column for calculating Y(2). Y(2) is the sum of Y(2), Y

Y(2) = R3 \* 2 R5





Compared to DCT adder/subtractor of [13], DHT adder/subtractor requires less no. of ALU + adders (only 4

ALUs+7 adders in proposed DHT but 9 ALUs +6 adders in DCT).

### V. FPGA IMPLEMENTATION RESULTS

Both memory based and proposed architecture for 1-D DHT is implemented in Xilinx XC3S500E FPGA device and table 4 and 5 shows the hardware utilization summary which indicate considerable hardware savings in proposed DA for DHT as compared to memory based DA. 2-D DHT is implemented using row column decomposition method



Fig.4 Proposed adder/subtractor matrix of all data for DHT

TABLE 2
Description of additions and subtractions of Fig.4

|                | Y(0) | Y(1) | Y(2) | Y(3) | Y(4) | Y(5) | Y(6) | Y(7) |
|----------------|------|------|------|------|------|------|------|------|
| ALUI           | +    | -    | +    | -    | +    | -    | +    | -    |
| ALU2           | +    | -    | +    | NO   | +    | -    | +    | NO   |
| ALU3           | +    | -    | +    | -    | +    | -    | +    | -    |
| ALU4           | +    | NO   | +    | -    | +    | NO   | +    | -    |
| Y-1            | 0    | 0    | R3   | R10  | R7   | R2   | R8   | R3   |
| $Y^0$          | R5   | R1   | R5   | R4   | R5   | R9   | R5   | R9   |
| Y <sup>1</sup> | 0    | 0    | 0    | 0    | 0    | R2   | 0    | R6   |
| $Y^2$          | 0    | R2   | 0    | R6   | 0    | 0    | 0    | 0    |
| $Y^3$          | 0    | R2   | 0    | R6   | 0    | 0    | 0    | 0    |
| Y <sup>4</sup> | 0    | 0    | 0    | 0    | 0    | R2   | 0    | R6   |
| Y              | 0    | R2   | 0    | R6   | 0    | 0    | 0    | 0    |
| $Y^6$          | 0    | 0    | 0    | 0    | 0    | R2   | 0    | R6   |
| Y7             | 0    | R2   | 0    | R6   | 0    | R2   | 0    | R6   |

"+" means addition and "-" means subtraction and "NO" means no operation.

TABLE 3 Comparison of adders of DHT and DCT in [13]

| scheme       | Adder matrix | Adder bit-width |  |  |  |
|--------------|--------------|-----------------|--|--|--|
| DCT          | 9 ALU +6     | 850             |  |  |  |
| Proposed DHT | 4 ALU +7     | 452             |  |  |  |

and table 6 shows the hardware utilization summary. 8x8 image data of table 7 is implemented and table 8 shows the matlab simulation result. Fig.5 shows the hardware implementation result using memory based DA method and fig.6 shows the hardware implementation result of proposed DA for DHT method. It is evident that proposed DA method implementation has less error as compared to memory based DA. Floating point representation is required to store pre-computed values in ROM for accuracy and hence floating point circuitry has to be used which will further increase the hardware resource in

Retrieval Number: E0272091512/12©BEIESP Journal Website: <u>www.ijitee.org</u> memory based DA. Also memory based DA is slow as it requires i+1 clock cycles (i is number of bits in input image data of one pixel) for 1-D DHT and subsequently 8\*((i+1)+(i+1+2)) clock cycles (generally input image pixel is represented by 8-bit, i.e i=8, and hence total clock latency will be 160 clock cycle) for 2-D DHT implementation (after 1-D transform, transformed data representation become 2-bit more than the input representation) where as proposed DA for DHT requires 1 clock cycle for 1-D DHT and 2-D DHT is implemented in total of 8 clock cycles with 50MHz clock. Although memory based DA work with faster clock as it has lesser calculations per clock cycle to perform than proposed DA for DHT

| □ rstn                   | 1           |            |
|--------------------------|-------------|------------|
| In sclk                  | 1           |            |
| ୀ <mark>ଲ</mark> x_valid | 1           |            |
| x_data[7:0]              | 01101010    | 01101010   |
| 🗓 xi_ready               | 1           |            |
| stg1_sum1[8              | 011010100   | 011010100  |
| stg1_sum2[8              | 011010100   | 011010100  |
| ▶ ■ stg1_sum3[8          | 011010100   | 011010100  |
| stg1_sum4[8]             | 011010100   | 011010100  |
| Tm fht_valid             | 1           |            |
| ▶ ■ fht_data[10:0        | 01101010000 | 0110101000 |
| ▶ 🚮 x0[7:0]              | 01101010    | 01101010   |
| ▶ <b>5</b> x1[7:0]       | 01101010    | 01101010   |
| ▶ <b>5</b> x2[7:0]       | 01101010    | 01101010   |
| ▶ <b>■</b> x3[7:0]       | 01101010    | 01101010   |
| ▶ <b>5</b> x4[7:0]       | 01101010    | 01101010   |
| ▶ <b>5</b> x5[7:0]       | 01101010    | 01101010   |
| ▶ <b>5</b> x6[7:0]       | 01101010    | 01101010   |
| ▶ <b>■</b> x7[7:0]       | 01101010    | 01101010   |
| ₩ x_valid_1d             | 1           |            |
| ▶ ■ cnt[2:0]             | 111         | 111        |

# **OUTPUT WAVEFORM**

# REFERENCES

- G. O. Young, "Synthetic structure of industrial plastics (Book style with paper title and editor)," in Plastics, 2nd ed. vol. 3, J. Peters, Ed. New York: McGraw-Hill, 1964, pp. 15–64.
- W.-K. Chen, Linear Networks and Systems (Book style). Belmont, CA: Wadsworth, 1993, pp. 123–135.
- H. Poor, An Introduction to Signal Detection and Estimation. New York: Springer-Verlag, 1985, ch. 4.
- B. Smith, "An approach to graphs of linear forms (Unpublished work style)," unpublished.
- E. H. Miller, "A note on reflector arrays (Periodical style—Accepted for publication)," IEEE Trans. Antennas Propagat., to be published.
- 6. J. Wang, "Fundamentals of erbium-doped fiber amplifiers arrays (Periodical style—Submitted for publication)," IEEE J. Quantum Electron., submitted for publication.
- C. J. Kaufman, Rocky Mountain Research Lab., Boulder, CO, private communication, May 1995.
- Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, "Electron spectroscopy studies on magneto-optical media and plastic substrate interfaces (Translation Journals style)," IEEE Transl. J. Magn.Jpn., vol. 2, Aug. 1987, pp. 740–741 [Dig. 9th Annu. Conf. Magnetics Japan, 1982, p. 301].
- M. Young, The Techincal Writers Handbook. Mill Valley, CA: University Science, 1989.
- (Basic Book/Monograph Online Sources) J. K. Author. (year, month, day). Title (edition) [Type of medium]. Volume(issue). Available: http://www.(URL)
- J. Jones. (1991, May 10). Networks (2nd ed.) [Online]. Available: http://www.atm.com
- (Journal Online Sources style) K. Author. (year, month). Title. Journal [Type of medium]. Volume(issue), paging if given. Available: http://www.(URL)
- R. J. Vidmar. (1992, August). On the use of atmospheric plasmas as electromagnetic reflectors. IEEE Trans. Plasma Sci. [Online]. 21(3). pp. 876—880. Available:
- 14. http://www.halcyon.com/pub/journals/21ps03-vidmar

