

# Performance Analysis of Low power Low-cost Signal detection of MIMO- OFDM using NSD

R.Gnanajeyaraman, P.Muneeshwari

Abstract— This paper aims to maximize throughput by minimizing power as possible. Scores of optimization techniques such as FFT, IFFT and memory optimization are available for reducing power of mobile OFDM systems. An approach for achieving reduction in power of MIMO OFDM system by optimizing FFT architecture is addressed in this paper. Memory references in MIMO OFDM transceivers are costly due to their long delay and high power consumption. To implement fast Fourier transform (FFT) algorithms on MIMO OFDM. The proposed FFT structure is the combination of memory reference reduction evaluated using performance parameters such as BER and SNR. In order to reduce the hardware complexity of the MIMO OFDM synchronization, this paper proposed an efficient autocorrelation scheme based on time multiplexing technique and the use of reduced samples while preserving the performance. QoS is an important consideration in networking, but it is also a significant challenge. This QoS is based on some parameter like network traffic, data loss, data collision and speed. The VLSI implementation was done using ModelSim and Xilinx .Strutural realization and analysis pertaining to timing highthroughput and low-cost design with high performance to detect PSS using NSD is derived in this paper.

Keywords: Low power, low cost, primary synchronisation signal(PSS), FFT,LTE, IFFT, Inter symbol interference(ISI)

#### I. INTRODUCTION

Multiple input multiple output (MIMO) system consists of multiple antennas at the transmitter and receiver ends to improve link reliability and data rates of the wireless communication system. Orthogonal frequency division multiplexing is efficient in synchronizing the received signal under fading environment and has been used in past times in applications that require a huge data rate. Fast Fourier Transform (FFT)/ inverse FFT (IFFT) processors are proposed for multiple - input multiple - output orthogonal frequency division multiplexing based IEEE 802.11n. Here the processor not only supports the operation of FFT/IFFT but also provides sufficient throughput rates but the drawback is hardware complexity and throughput is less compared with conventional approach. Variable length FFT processor ASIC based MIMO OFDM provides data rate for 192 Mbps with a 20 MHz bandwidths for IEEE802.11a standard. Here the paper mainly focuses on silicon complexity of MIMO OFDM system. Throughput of the system is very less compared with other systems [6]. MIMO OFDM Base band transceiver implementation is based on ASIC. Verification is based on testbed and GUI monitor. Here the authors have developed separate testbeds for MIMO OFDM system for verification purpose but haven't concentrated on throughput and BER. In order to meet IEEE 802.

# Manuscript published on 30 April 2013.

\*Correspondence Author(s)

**R.Gnanajeyaraman** Computer Science and Engineering, SBM College of Engineering and Technology, Dindigul, Tamilnadu, India.

Mrs.P.Muneeshwari, Information Technology PSNA College of Engg & Tech,Dindigul,Tamilnadu, India.

© The Authors. Published by Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license <a href="http://creativecommons.org/licenses/by-nc-nd/4.0/">http://creativecommons.org/licenses/by-nc-nd/4.0/</a>

11n requirements [8][9] the processor not only supports the operation of FFT/IFFT in 128 points and 64 points but can also provide different throughput rates for simultaneous data sequences. The difficulty is number of increases more and more. Hence, the cost and complexity of the system is also increased. IEEE 802.11a established for WLAN standard provides 54 Mbps throughput using SISO OFDM transceiver. Here the throughput of SISO OFDM is very less; to increase the throughput MIMO OFDM is preferred. The IEEE802.11n provides data rate up to 600 Mbps with transmission speed of 80 MHz. Latency is incurred by pre-processing channel matrices for MIMO detection [11]. Different approaches for design and implementation of 4x4 MIMO OFDM transceiver which provides data rate of 1 is discussed elsewhere A[12][13][14][15]. Here MMSE MIMO detector is used to reduce the latency but the disadvantage is that it requires larger circuit for high speed transmission. In any n bit 2's complement multiplier the maximum number of addition and subtraction operation is n/2 [16]. However, the above mentioned results are still insufficient in day to day requirements like transmission speed and power. In this paper, the VLSI implementation of MIMO OFDM transceiver suitable for low power application has been studied. The FFT optimization reduce the delay time, power thereby simultaneously increasing speed.

# II. PROPOSED ARCHITECTURE

## A. Serial to Parallel Converter

In an OFDM system, each channel can be broken into various sub-carriers. The use of sub-carriers makes optimal use out of the frequency spectrum but also requires additional processing by the transmitter and receiver. This additional processing is necessary to convert a serial bit steam into several parallel bit steams to be divided among the individual sub-carriers.

#### **B.** Interleaver

Interleaving is technique commonly used in communication systems to overcome correlated channel noise such as burst error or fading. The interleaver rearranges input data. Such that consecutive data are spaced apart.

#### C.IFFT

The modulation of data into a complex waveform occurs at the IFFT stage of the transmitter. Here, the modulation scheme can be chosen completely independently of the specific channel being used and can be chosen based on the channel requirements. In fact, it is possible for each individual sub-carrier to use a different modulation scheme. The role of the IFFT is to modulate each sub- channel on to appropriate carrier





# Performance Analysis of Low power, Low-cost signal detection of MIMO-OFDM using NSD



Fig.1.MIMO-OFDM based architecture for NSD

## D. Cyclic prefix Insertion:

In wireless communication systems are susceptible to multipath channel reflections, a cyclic prefix is added to reduce ISI.

#### E. Modulator:

Once the bit steam has been divided among the individual sub-carriers, each sub-carrier is modulated as if it was an individual channel before all channels are combined back together and transmitted as a whole.

#### F. De modulator

The receiver performs the reverse process to divide the incoming signal into appropriate sub-carriers and then demodulate the signal.

# G. Cyclic Prefix Remover:

The cyclic prefix is removed and combined all sub-carriers channels are transmitted as one signal.

#### H. FFT

FFT processor transforms the signals from the frequency domain into the time domain.

#### I. De interleaver:

The interleaved data is arranged back into the original sequence.

## Parallel to Serial conversion:

Thus, the parallel to serial conversion stage is the process of summing all sub-carriers and combining them into one signal. As a result, all sub-carriers are generated perfectly and simultaneously

# III. SYSTEM MODEL AND PROBLEM DEFENITION

#### A MIMO Detection techniques:

In this section, first discuss techniques for MIMO spatial multiplexing systems namely, the(optimal) ML detector, the (suboptimal) equalization-based and nulling and cancelling detectors, and the FPSD implementation of ML detection. Study the effects of bad channels on the performance of sub optimal detectors and discuss the SPA.

# MIMO Detection schemes



#### B. Maximum Likelihood Detection:

ML detection is optimal in the sense of minimum error probability. When all data vectors are equally likely, and it fully exploits the available diversity. For our system model [1] and with the assumptions made in section 1, the ML detector is given by

#### MIMO Detection schemes



ML detection corresponds to a non-convex optimization problem because D is not a convex. Therefore, standard numerical algorithm for convex optimization are not applicable. The straightforward, solution of [2] by comparing  $\|\mathbf{r}\text{-HD}\|^2$  for all dED has computational complexity  $O(|\mathbf{A}|\mathbf{M}_T)$  and in fact the complexity of ML detection may excessive already for moderate values of  $\mathbf{M}_T$  and constellation size  $|\mathbf{A}|$ . FPSD implementation of ML detection will be discussed.

# C. Equalization- Based detection

In linear equalization based detection, an estimate of the transmitted data vector d is formed as y=Gr with an "equalization matrix G". The detected data vector is then obtained as  $d^{\wedge} = Q\{y\}$ , where  $Q\{.\}$  denotes component wise quantization according to the symbol alphabet A.





For Zero Forcing (ZF) equalizer, G is given by the pseudo-inverse [13] of H, ie.,  $G = H^H r = (H^H H)^{-1} H^H r$  Which is the transmitted data vector d corrupted by the transformed noise  $\tilde{\omega}{=}H^H$  w. This means that the interference caused by the channel H is completely removed ('forced to zero'). However, in general the transformed noise  $\tilde{\omega}=H^H$  w is larger than w(" noise enhancement"); this will be analysed. The ZF equalized received vector YZF can be seen as the solution to a relaxed ML problem (cf.(2)) where data set D underlying ML detection is relaxed to the convex set  $C_T^M$  [12].

$$Y_{ZF} = \arg \min \{ \| r - Hy \|^2 \}.$$

The noise enhancement effect plaguing the ZF equalizer can be reduced by using the minimum mean-square error(MMSE) equalizer

$$G=(H^HH+\sigma^2\omega I)^{-1}H^Hr$$
.

This can again be seen as the solution to a relaxed ML problem, with the distance  $\|\mathbf{r} - \mathbf{H}\mathbf{y}\|^2$  augmented by a penalty term  $\sigma^2 \omega \| y \|^2$  that prevents y from growing too large [12].

$$Y_{MMSE} = arg min \{ ||r-Hy||^2 + \sigma^2 \omega ||y||^2 \}$$

There also exist more sophisticated detection techniques based on the principle of relaxing the ML problem (Eg. Semi definite relaxation as proposed in [12] for multiuser detection).

While ZF or MMSE equalization alone does not, in general, imply a loss of information (ie., an optimal detector could still be based on Y<sub>ZF</sub> or Y<sub>MMSE</sub>), the subsequent component wise equalization of YzF or Y<sub>MMSE</sub> is suboptimal since it does not take into account the correlation of the components of the transformed noise  $\tilde{\omega}$ . In fact, ZF or MMSE detection can only exploit a diversity of order MR-MT +1[4].

On the other hand, the computational complexity in rather low. The task with highest complexity in the calculation of the equalizer matrix G. Thus, if we assume M<sub>T</sub> =M<sub>R</sub> for simplicity, the complexity behaves as  $\Theta(M_T^3)$ . Note that MMSE detection is different from ML or ZF detection in that it required an estimate of the noise variance.

Nulling and Cancelling(NC) is a recursive detection technique using the decision – feedback principle [3]. At each detection step, a single data vector component is detected and the corresponding contribution to the received vector r is substitute from r1'.

Let the received signal be

 $\rho = \{ \| (\boldsymbol{G})_1 \|^2, \| (\boldsymbol{G})_2 \|^2 \}$ 

 $k = min(\rho)$  Choosing the best channel

Assume k=2, then the nulling vector  $\mathbf{g} = \begin{pmatrix} g_{21} \\ g_{22} \end{pmatrix}$ 

## D.OFDM system model with Carrier Frequency Offset

OFDM to improve spectrum efficiency. In OFDM systems, a sequence of N complex data symbols is considered as N orthogonal sub carriers during the kth OFDM block, the sequence of data symbols is defined as follows:

$$d(k) = [dO(k), dI(k), ..., dN-I(k)].T$$
 (1)

the sequence of data symbols is modulated using an N-point inverdiscrete Fourier transform (IDFT) process that produces the sequence

$$x(k) = Wd(k) \tag{2}$$

where W is the normalized N-by -N IDFT matrix and

$$x(k) = [x_0(k), x_1(k)... x_{N-1}(k)]^T$$
 (3)

consequently, the nth sample in the sequence x(k) can be expressed as

named as cyclic prefix (CP), is created by copying the last  $N_{\rho}$ samples of the IDFT output and appending them at the beginning of OFDM symbol to be transmitted. So the transmitted OFDM block consists of  $(N + N_g)$  samples.

Table I Delay profiles for E-UTRA Channel Models

| Model                                       | Number of<br>Channel taps | Delay spread (r.m.s.) | Maximum<br>Excess tap<br>delay (span) |
|---------------------------------------------|---------------------------|-----------------------|---------------------------------------|
| Extended<br>Pedestrian A                    |                           | 45ns                  | 410ns                                 |
| (EPA)<br>Extended<br>Vehicular A            |                           | 357ns                 | 2510ns                                |
| (EVA)<br>Extended<br>Typical Urban<br>(ETU) |                           | 991ns                 | 5000ns                                |

At the receiver side, after removing the first  $N_g$  CP samples, the received sequence

$$y(k) = [y_o(k), y_{1(k)...}, y_{N-1(k)]}^T$$
 (5)

is obtained [9] 
$$y(k) = e^{j2\pi/N\mathcal{E}k(N+Ng)} A(\mathcal{E}) WH(k) + N(k)$$
 (6)

where  $\mathcal{E}$  represents the normalised CFO, and A( $\mathcal{E}$ ) represents on the effect of the accumulated phase rotation caused by the CFO on the time domain samples

$$\epsilon(-0.5, 0.5)$$
 (7)

$$A = \operatorname{diag}\left(\left[e^{j2\pi/N\xi x0}, e^{j2\pi/N\xi x1}, ..., e^{j2\pi/N\xi x(N-1)}\right]^{T}\right). \tag{8}$$

H(k) denotes the channel frequency response during the kth OFDM block

$$H(k) = diag([H_0(k), H_1(k), ..., H_{N-1}(k)]^T).(9)$$

N(k) represents a zero-mean complex white Gaussian noise sample with variance  $N_0$ .

Assuming that the receiver sampling clock is aligned to that of the transmitter, then the *n*th element of y(k) can be expressed as N-1

$$y_n(k) = e^{j2/Nx(N+N_g)/\sqrt{N}\sum_i d_i(k)H_i(k)} e^{j2/N(+i)} + Nn(k)$$
 (10)

A synchronization channel (SCH) is specified in LTE system to transmit PSS and secondary synchronization signal (SSS)[1]. The sequence du(n) used for the PSS is generated from a frequency-domain ZC sequence [17] according to  $\mathbf{d_u}(\mathbf{n}) = \mathbf{e}^{-\mathbf{j},\mathbf{n}\mathbf{u}\mathbf{n}(\mathbf{n}+1)/63} \mathbf{n} = 0,1,2,...,30$  (11)  $\mathbf{d_u}(\mathbf{n}) = \mathbf{e}^{-\mathbf{j},\mathbf{n}\mathbf{u}\mathbf{n}(\mathbf{n}+1)/63} \mathbf{n} = 31,32,...,61$ 

$$\mathbf{d}_{n}(\mathbf{n}) = \mathbf{e}^{-\mathbf{j}\pi\mathbf{u}\mathbf{n}(\mathbf{n}+1)(\mathbf{n}+2)/63} \quad \mathbf{n} = 31.32 \quad 6$$



Table II Root indices for the PSS

| N <sub>ID</sub> |   | Root index u |  |
|-----------------|---|--------------|--|
| (               | ) | 25           |  |
| 1               | ] | 29           |  |
| 2               | 2 | 34           |  |

here the ZC root sequence index u is given by table II [17]. The three different ZC sequences are orthogonal to each other, and each sequence corresponds to a sector identity which is in the range of 0 to 2. The ZC sequence is chosen for its good periodic autocorrelation and cross- correlation properties. In particular, these sequences have a low frequency offset sensitivity, which is described in [18]. Thus, it is easy detect PSS during the initial synchronization because the ZC sequence has the flat frequency domain autocorrelation property and the low frequency offset sensitivity.

# IV. PRACTICAL DETECTION METHOD

# A. Method for without down sampling by 10-bit ADC

The matched filter can be expressed as

$$MF_{qt} = \sum coeff(k)y(t-k)$$
 (12)

where  $y_{qt(k)}$  is the received signal sampled by a 10-bit, 122.88 MHz pipelined ADC, and coeff(k) is obtained from (13) and (14)

$$coeff = (W^H d_u)^H$$
 (13)

where

Coeff = [Coeff(63) Coeff(62)....Coeff(1) Coeff(0)] (14)

Every output of the matched filter  $MF_{qt}$  is buffered since there is no down sampling module, and it needs a large area buffer which is very costly.

## B. CSFD

The process of CSFD can be divided into three stages. The first stage is to measure the faulty weights of all N nodes. Then, the faulty nodes are determined. The final distributed estimate is generated in the last stage.

# C. ECSFD

There are three difficulties to be overcome for implementing CSFD with a VLSI circuit. The first one is that it requires some extensive and complex computations, such as logarithm and division in the detecting process. The second difficulty is that the integration required for the estimate of in is quite complex. The last difficulty is that the calculation of numerical integration needs many bits. In order to overcome these difficulties, we modify CSFD and propose an Efficient Collaborative Sensor Fault Detection (ECSFD). ECSFD is simple and requires lower computational complexity, thus lower hardware cost and power consumption can be achieved. Furthermore, ECSFD achieves almost the same performance as CSFD. The ECSFD scheme avoids the logarithm and division operations, simplify the integration and transform the numerical integration.

For the requirement operations in ECSFD, the word length of signals is decided based on the following two considerations: a) The performance of ECSFD circuit must be comparable to that of CSFD.

b) The hardware cost of ECSFD circuit must be minimized.

**Table III- Simulation Assumption** 

| Parameter            | Unit          |  |
|----------------------|---------------|--|
| Number Of Rx Antenna | 4             |  |
| Number of Tx antenna | 4             |  |
| Frequency offset     | 12.5KHz       |  |
| Carrier frequency    | 2.5 GHz       |  |
| Symbol detection     | Replica-based |  |
| Root index u         | 29            |  |

#### V. SIMULATION RESULTS

We assume that there are four receive antennas and four transmit antennas in the simulated LTE MIMO system. Replica –based symbol is very useful for symbol timing detection since a diversity gain of 3dB can be obtained when to PSSs are received in different time slot. Higher diversity gain can be achieved when more than two PSSs are used in the detection. At most 16 PSSs are transmitted in the simulation that is the detection gives up after 16 PSS correlation are calculated at the receiver. That there are different delay profiles, Doppler spectra and channel matrices defined in E-UTRA channel model. Here simulate both original and proposed method.



Original method using 1-bit ADC with down sampling by 8 does not degrade the performance but some delay is occur. As a result the method of 10-bit 122.88 MHz ADC with down sampling by 8 is proposed as the low power and low cost design for PSS detection with good search performance.

#### VI. HARDWARE IMPLEMENTATION

As discussed in the previous section, the performance of the proposed method for PSS detection is acceptable in a practical LTE system; thus, its implementation detail is described in this section where the matched filter is considered first followed by the architecture of proposed PSS detector.

#### A. Architecture of matched filter

The matched filter is an important component in the PSS detection. Here use 64 –tap time domain matched filter; hence 64 complex multiplication units per matched filter are used in this calculation

/w.ijitee.org

 $MF = \sum_{k=0} coeff(k)y(t-k)$ 







Fig. 2 matched filter architecture with one complex unit



Fig.3 Original Architecture of the whole PSS detection



Fig. 4. Area efficient architecture of the whole Novel signal Detection

Since 84 matched filters are required in the system, a total of 5376 units of complex multiplication is needed, which is not reasonable for a practical communication due to high cost of multiplication unit in the receiver. In practice, the sampling rate of input data to the matched filter is 1.92 MHz while the system clock is 122.88 MHz, which implies that we can use only one complex multiplication during 64 cycles instead of using 64 units of complex multiplication shown in fig.2. As a result, 84 units of complex multiplication are enough for the whole system.

# B.Architecture of PSS detection

As a result, only 1200correlation values need to be stored in RAM with 1200 addresses, which reduce the RAM size of the whole system by a factor of almost 8.

This architecture is much smaller than that of the orthogonal architecture is much smaller than that of the original architecture, which reduces the cost of the chip significantly. From the power perspective, not only 10-bit ADC reduces the power consumption, but the hardware of digital logic also does.

Retrieval Number: E0651032413/13©BEIESP

Journal Website: www.ijitee.org

#### VII. CONCLUSION

This paper proposes a VLSI implementation of high throughput MIMO OFDM transceiver for system which achieves 1.4 Gbps throughput. The circuit implementation in a 28 nm(transistor gate size) library with less circuit area and evaluated in lower power dissipation. As the area and power consumption of the original implementation architecture are too large acceptable, based on simulation results and ASIC synthesis results, a more practical implementation architecture is proposed where PSS is detected efficiently and accurately at a much lower power and lower cost which renders its feasible in the implementation of UE chip

#### REFERENCES

- 1. Fu Bo & Ampadu Paul, J Signal Process Syst, 56(1) (2009) 59-68
- 2. Chang Y & Park S C, IEICE Tans Fundamentals, E87-(2004) 3020- 3024 (11)
- Kim Hun Seok, Zhu Weijun, Bhatia Jatin, Mohammed Karim, Shah Anish & Daneshrad Babak, EURASIP J Adv Signal Process, 2008.
- LaRoache Isabelle & Roy Sebastien, An Efficient Regular Matrix Inversion Circuit Architecture for MIMO Processing, IEEE Int Symp on Circuits and Systems (ISCAS), May 2006, pp. 4819-4822.
- Lin Y T, Tsai P Y & Chiueh T D, IEE Proc Comput Digit Technol, 152(4) (2005) 499-506.
- Perels D, Haene S, Luethi P, Burg A, Felber N, Fichtner W & Bolcskei H, IEEE Trans VLSI Syst, 5(2005) 215- 218.
- Gresien Pierre, Haene Simon & Burg, EURASIP J Embedded Syst, 2008, Article ID242584
- Reisis D & Vlassopoulos N, IEEE Trans Circuits Syst 55(11) (2008) 3438-3447.
- 3438- 3447.9. Radhouane R, Liu P & Modlin C, in *proc, IEEE Int Symp Circuits*
- Syst, 1(May 2000) 116-119.
  Yoshizawa Shingo & Miyanaga Yoshikazu, VLSI Implementation of SISO- OFDM Transceivers, IEEEInt Symp Communications Information Technologies (ISCIT), No. T2D-4, Oct 2006.
- Yoshizawa Shingo, Yamauchi Yasushi Miyanaga Yoshikazu, A complete pipelined MMSE detectionArchitecture in a 4x4 MIMO-OFDM receiver, IEEE Int Symp on Circuits and Systems (ISCAS), May 2008, pp. 1248-1251.
- 12. Yoshizawa Shingo, Yamauchi Yasushi Miyanaga Yoshikazu, VLSI Architecture of a 4x4 MIMO-OFDM With an 80-MHz Channel Bandwith Transceiver, IEEE IntSymp on Circuits and Systems (ISCAS), May 2009, pp. 1248-1251.
- Yoshizawa Shingo, Yamauchi Yasushi Miyanaga Yoshikazu, VLSI Implementation of a4x4MIMO- OFDM Tranceiver for 1Gbps Data transmission, IEEE Int Symp on Circuits and Systems (ISCAS), May 2010, 1743-1746.
- Lamarca Rey F & Vazquez M G, IEEE Trans signal Process, 53 (3) (2009) 1741-1755.
- Shin M & Lee H, A high-speed four-parallel radix-2 4FFT/IFFT processor for UWB applications, Proc.IEEE I nt. Symp. Circuits and Systems, May 2008, pp.960-963.
- 16. Ma G K & Taylor F J, IEEE ASSP Mag, Jan 1990, pp.6-20.
- 3<sup>rd</sup> Generation Partnership Project (3GPP), Sophia- Antipolis Cedex, France, 3GPP TS 36.11 v8.9.0 3<sup>rd</sup> Generation Partnership Project; Technical Specification Group Radio Access (E-UTRA); Physical Channels and Modulation (Release 8), 3<sup>rd</sup> Generation Partnership Project, Dec.2009, 3GPP.
- 18. S.Sesiya, I. Toufik, and M. Baker, *LTE-The UMTSLong Term Evolution: From Theory to Practice*. New -York: Wiley, 2009.

