

# An Enhanced Low Power Dual Data Injection Technique for Coarse - Grained Reconfigurable Architecture

# S.Munaf, A.Bharathi, A.N.Jayanthi

Abstract: oarse-gr ained reconfigurable architectures (CGRA) having a well-organized, more efficient configurable array of processing unit and high speed cache unit. The processing unit performs required arithmetic and logic operations. Now a day's video processing applications power consumption plays an important role. We propose Double Data Rate Synchronous Memory architecture can address and reduce the power consumption caused by reconfiguration. An input data bits are injecting on the data bus in the interval of low to high and high low clock period. All modules have been designed and implemented in vertex using behavioral level with VHDL coding and to Simulate in Xilinx ISE navigator.

Keywords: Low power VLSI architecture, CGRA, DDR SRAM Controller.

## I. INTRODUCTION

In an earlier days High-quality multimedia computation and data transferable to achieve by employing the powerful mapping algorithms. This processing is more integrated and high computation, data transfer one. A general-purpose processor reinforcement for various applications, but they may not provide sufficient performance to cope with the complexity of the applications.

ASIC can optimize the implementation in terms of power and performance, but they restrained their computational potential of the application.

This barrier can be eliminating by reconfigurable architecture (CGRA). Compared with ASIC this architecture construct with powerful and dedicated reconfigurable processing elements (PEs) and memory unit, these reconfiguration structure enhance their performance.

This architecture's performance is better. Due to their power consumption the utilization was limited, this mean it is not suitable for all the applications.

Revised Manuscript Received on December 30, 2019.

\* Correspondence Author

**Mr. S.Munaf\***, Assistant Professor(Sr.Gr), Department of ECE, Sri Ramakrishna Institute of Technology, Coimbatore, India , munafece@gmail.com

**Dr. A.Bharathi**, Professor, Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam, India , bharathia@bitsathy.ac.in

**Dr. A.N.Jayanthi**, Associate Professor, Department of ECE, Sri Ramakrishna Institute of Technology, Coimbatore, India jayanthi\_an@rediffmail.com

© The Authors. Published by Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license <a href="http://creativecommons.org/licenses/by-nc-nd/4.0/">http://creativecommons.org/licenses/by-nc-nd/4.0/</a>

So power consumption degrades their total utilization efficiency. At the period of dynamic reconfiguration power consumption mainly occurs due to the data manipulation mode of operations in every clock cycle.

CGRA has computational architecture has a dynamic configurable memory unit (cache). The reconfiguration technique enhances the performance but it increases their level of power consumption. Therefore, minimizing the power consumption on the cache memory is the major concern of CGRA.

In this research an optimized Low Power Technique employed on the configuration cache. Power consumption issues addressed by power-conscious architectural technique called to Dual Data Injection Technique (DDIT). The main scope of this technique is to reduce the data transfer time and also increase the operating speed and power consumption. Higher performance can be achieved by increasing the data transfer rate or decreasing the data queuing effects and demonstrated by using real-time application benchmarks.



Fig 1. Basic Reconfigurable Architecture

## II. RELATED WORK

In the recent years more researcher concentrates on the reshaping the computing process so that they introduce a new reconfigurable architecture's [1].

Reconfigurable architecture employed two types of arrays models, either network or linear based reconfigurable array. Network -based reconfigurable arrays support the parallelism and the linear reconfigurable arrays are supports static or dynamic reconfiguration [2]. MorphoSys [9] and reconfigurable multimedia coprocessors are data parallelism type architectures. MorphoSys model consists of a reduced instruction processor, Reconfigurable 8x8 array of ALUs , frame memory, long term memory, and DMA controller.

Multimedia coprocessors consist of a universal control unit and a nano processors.

Journal Website: www.ijitee.org

# An Enhanced Low Power Dual Data Injection Technique for Coarse - Grained Reconfigurable Architecture

This processor access multiple data and multiple instruction in a random faction.

A Data path streaming pipeline boosters are RaPiD and PipeRench are employed linear array format [9]. The RaPiD architecture provides special run time reconfiguration. Naturally these are one dimension architecture and their execution mode reconfiguration creates irregularly distribution due to this instance increases cache miss rate, it impact their area and performance,

In our concept, we are implementing DDR<sub>2</sub> SDRAM as a reconfiguration of cache unit for PE array.

# III. PROPOSED ARCHITECTURE



Fig 2.Block diagram

The above architecture involves (Fig2) the RISC processor, which has the data bus and address bus. The fetching, decoding, execution functions are done by DDR SDRAM controllers. These functions will reduce the overall complexity. Processing elements (PE) execute arithmetic and Logical operations. The input data will be processed according to the RD/WR, Chip selects, Data bus and address busses. Our memory cache unit supports the dynamic configuration. Data bits are injecting on the data bus in the interval of low to high and high low clock period so the utilization of computation elements and data transfer ratio are enhanced.

## IV. RISC PROCESSOR



Fig 3.RISC Processor Schematic

Proposed CGRA has high performance 16-bit reduced instruction processor shown Fig 3. The RISC has the data and address bus. The fetching, decoding, execution functions are done here. The context word in the IR register based on the code corresponding execution flow is processed by the Execution Unit and Memory Unit (ROM). The data computations are done in A (accumulator) register and B

register with help of the ALU, final results are stored in 16 bits A register and processor output register. This processor performs high speed ALU operations due to the DDR<sub>2</sub> RAM.



Fig 4. RISC Processor Module Result

The above fig 4 shows the simulated results for RSIC processor modules. The module blocks are designed with VHDL coding and simulated by ISE/Modelsim. The simulated result shows the processor's arithmetic and logical computation output for giving two 16 bit data stored in Register A and B, the same result in output register and it is taken as processor module final output.

## V. MOTIVATION

We propose DDR<sub>2</sub> SDRAM Memory architecture to reduce the power-overhead caused by reconfiguration. The power reduction can be achieved double injection the data in both the edges i.e. rising and falling edge of the CLK signal.

# VI. POWER-CONSCIOUS TECHNIQUES BASED

We re-configure the entire section of the cache section by DDR2 SDRAM. The new configured memory operating speed is twice the external data bus clock rate. Hence the input data transfer latency was heavily reduced. The input data issues are avoided. So that the new data supplying method provide better computation performance and power consumption of data queue are capably controlled.

The ALU and Shift Functions, operations are performed by using PE array. All these processes will be done to simulate in Xilinx ISE navigator. Addressing memory in a DDR2 SDRAM memory requires four separate addresses: Chip Select, Bank Select, Row Address and Column Address. The DDR2 SDRAM is designed with RTL with VHDL coding its simulated outputs are as shown in the fig5.



Fig5:128 Meg X 4 Functional Simulation Results



Journal Website: www.ijitee.org



## VII. POWER COSTING

By employing the multiple contexts pipelining technique power usage are tabulated in table-1. This configuration cache power consumption saved up to 93.85%. Dual injection method implementation results are displayed in Table-II. From this estimations DIT is a better power saving approach in CGRA. This technique was analyzed using FIR filter implementation with multiple taps and it shows the maximum reduction ratio are improved, than the result of PipeRench and shows a power measurement with varying FIR filter tap sizes. The power analyzer shows that the power consumption of multiple Taps of FIR ranges from 600to700mW.

Table -I Existing Result Analyses

| Tubic T Empering Resource Timery Ses |       |                                  |       |          |       |       |
|--------------------------------------|-------|----------------------------------|-------|----------|-------|-------|
|                                      |       | Summary of proposed method Power |       |          |       |       |
|                                      |       | save %                           |       |          |       |       |
| Summary of                           | C     | ache                             |       | Γotal    |       |       |
| measuring                            |       |                                  |       |          | Cach  |       |
| nodes                                | stand | proposed                         | stand | proposed | e     | Total |
| Tri-diagona                          | 106.1 |                                  | 340.  |          |       |       |
| 1                                    | 8     | 19.25                            | 5     | 240.28   | 81.87 | 29.4  |
|                                      | 107.6 |                                  | 371.  |          |       | 27.4  |
| Multiplexer                          | 5     | 19.56                            | 1     | 269.21   | 81.83 | 6     |
| DSP FIR                              |       |                                  | 330.  |          |       | 37.1  |
| imp                                  | 143.2 | 19.44                            | 1     | 270.36   | 86.33 | 9     |

# VIII. PROPOSED METHOD IMPLEMENTATION **RESULT ANALYSIS**

| Device                                                                                                        | _                | Block Summary |       | Voltage Source Summary             |              |       |                     |                      |
|---------------------------------------------------------------------------------------------------------------|------------------|---------------|-------|------------------------------------|--------------|-------|---------------------|----------------------|
| Part                                                                                                          | XC4VLX100        | Block         | Power | Source                             | Voltage      | Power | I <sub>cc</sub> (A) | I <sub>cco</sub> (A) |
| Package                                                                                                       | FF1148           | CLOCK         | 0.020 | V <sub>CCINT</sub>                 | 1.20         | 0.558 | 0.017               | 0.448                |
| Grade                                                                                                         | Commercial       | LOGIC         | 0.000 | Vccaux                             | 2.50         | 0.273 | 0.014               | 0.095                |
| Process                                                                                                       | Typical          | 10            | 0.050 | V <sub>000</sub> 3.3               | 3.30         | 0.000 | 0.000               | 0.000                |
| Stepping                                                                                                      | Stepping 2       | BRAM          | 0.000 | V <sub>cco</sub> 2.5               | 2.50         | 0.048 | 0.017               | 0.002                |
| Thermal Informat                                                                                              | ion              | DCM           | 0.029 | V <sub>cco</sub> 1.8               | 1.80         | 0.000 | 0.000               | 0.000                |
| Ambient Temp (°C)                                                                                             | 50.0             | PMCD          | 0.000 | V <sub>cco</sub> 1.5               | 1.50         | 0.000 | 0.000               | 0.000                |
| Airflow (LFM)                                                                                                 | 250              | DSP           | 0.000 | V <sub>cco</sub> 1.2               | 1.20         | 0.000 | 0.000               | 0.000                |
| Heat Sink                                                                                                     | Medium Profile   | PPC           | -     | VCCAURTE                           | 1.20         | 0.000 | 0.000               | 0.000                |
| Custom OSA                                                                                                    | 5.2              | MGT           | -     | VCCAUNEN                           | 1.20         | 0.000 | 0.000               | 0.000                |
| Board Selection                                                                                               | Medium (10"x10") | EMAC          | -     | V <sub>TTX</sub>                   | 1.50         | 0.000 | 0.000               |                      |
| # of Board Layers                                                                                             | 4 to 7           |               |       | V <sub>TRX</sub>                   | 1.50         | 0.000 | 0.000               |                      |
| Custom OJB                                                                                                    | 2.1              |               |       |                                    |              |       |                     |                      |
| Board                                                                                                         |                  |               |       |                                    |              |       |                     |                      |
| Thermal Summar                                                                                                | у                | Power Sum     | mary  | Impo                               | ort from ISE |       | Reset to Defau      | il to                |
| Effective ΘJA ('C/W)                                                                                          | 3.4              | Quiescent(    | 0.779 | Impo                               | ort from XPE |       | Set Toggle Ra       | te                   |
| Max Ambient (°C)                                                                                              | 97.0             | Dynamic (W)   | 0.099 |                                    |              |       |                     |                      |
| Junction Temp(°C)                                                                                             | 53.0             | Total (W)     | 0.878 | Advanced Options Set Default Clock |              | ock   |                     |                      |
| Comments                                                                                                      |                  |               |       |                                    |              |       |                     |                      |
| Mapping Report from DDDR2 SDRAM is Imported to Xilinx Power Estimator and Produced the Total power of 0.878W. |                  |               |       |                                    |              |       |                     |                      |
| _                                                                                                             | _                |               |       |                                    |              | _     |                     |                      |

Our proposed reconfiguration technique is used mean we can improve memory size as well as reduced power consumption are achieved .The testing results and graphs shows the experimental analysis.

Table-II- Proposed Method Result analysis

| Table-II- Proposed Method Result analysis        |             |             |  |  |  |
|--------------------------------------------------|-------------|-------------|--|--|--|
| Measure Points                                   | No. of used | utilization |  |  |  |
| LUTS                                             | 81          | -1%         |  |  |  |
| I/O                                              | 32          | 8.50%       |  |  |  |
| Bi Directional                                   | 33          | 8.30%       |  |  |  |
| I/O Delay<br>Controllers<br>Power<br>Utilization | -           | 0.050mw     |  |  |  |
| Back RAM<br>Power                                |             | 0w          |  |  |  |
| Data Channel<br>Modifier Power                   | 1           | 0.029mW     |  |  |  |
| Dynamic Power                                    |             | 0.099mW     |  |  |  |
| Quiescent Power                                  |             | 0.799mW     |  |  |  |
| Total Power                                      |             | 0.878mW     |  |  |  |

Power by Function 0.900 0.800 0.700 € 0.600 0.500 0.400 0.300 0.200 0.100 0.000







# IX. CONCLUSION

Coarse-grained reconfigurable architectures can be easily customizable as a necessary condition of various applications and having the evidence, based on the reconfigurable computational elements we can improve their performance. Mainly re-configuration done in the cache unit by dynamic fashion, for that they pay large power consumption due to data latency. If the latency increases, then it leads to the data holding time and required more time to complete their ALU operations.



# An Enhanced Low Power Dual Data Injection Technique for Coarse - Grained Reconfigurable Architecture

Our proposed Dual Data Injection was implemented through DDR<sub>2</sub>, it improves the computation speed. Our memory structure is more efficient than the previous one and implemented in Xilinx Virtex 4 FPGA. Its mapping report shows the total power consumption is 0.878 W. Positively the architecture power saving and performance are improved.

Bharathiar University. She is having 18 years of teaching experience. Her area of specialization in Ph.D., is VLSI Design.

#### ACKNOWLEDGMENT

The authors would like to thank the authorities of Sri Ramakrishna Institute of Technology - Coimbatore for providing research support and environments for carrying out this research works.

## REFERENCES

- R. Hartenstein, "A Survey on Embedded Reconfigurable Architectures" in proc. 2017 International Conference on Communication and Signal Processing (ICCSP) published in IEEE Xplore: 08 February 2018 pp1500-1503.
- K. Choi, "Coarse-Grained Reconfigurable Array Architecture and Application Mapping", IPSJ Trans. Systems LSI Design Methodology, vol. 4, pp. 31-46, 2011.
- Yoonjin Kim, "Low power Reconfiguration Technique For CGRC", IEEE Trans Vol.17 May 2009.
- R. Hartenstein, "A decade of reconfigurable computing: A visionary retrospective," in Proc. Des. Autom. Test Eur. Conf., Mar. 2001, pp. 642-649.
- N. Bansal, S. Gupta, N. D. Dutt, and A. Nicolau, "Analysis of the performance of coarse-grain re-configurable architectures with different processing element configurations," in Proc. Workshop Appl. Specific Process., Dec. 2003.
- H. Zhang, M. Wan, V. George, and J. Rabaey, "Interconnect architecture exploration for low-energy reconfigurable single-chip DSPs," presented
- at the VLSI, Washington, DC, Apr. 1999.
- J. Lee, K. Choi, and N. D. Dutt, "Mapping loops on coarse-grained reconfigurable architectures using memory operation sharing," Center for Embedded Computer Systems (CECS), Univ. California Irvine, Tech. Rep. 02-34, 2002.
- M. Ahn, J. W. Yoon, Y. Paek, Y. Kim, M. Kiemb, and K. Choi, "A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures," in Proc. Des. Autom. Test Eur. Conf., Mar. 2006, pp. 363-368.
- 10. H. Singh, M.-H. Lee, G. Lu, F. J. Kurdahi, N. Bagherzadeh, and E.M. C. Filho, "MorphoSys: An integrated reconfigurable system for data-parallel and computation-intensive applications," Trans.Comput., vol. 49, no. 5, pp. 465-481, May 2000.
- 11. Y. Kim, C. Park, S. Kang, H. Song, J. Jung, and K. Choi, "Design and evaluation of coarse-grained reconfigurable architecture," in Proc. Int. SoC Des. Conf., Oct. 2004, pp. 227-230.

## **AUTHORS PROFILE**



Mr. S. Munaf received his M.E degree in VLSI Design from Anna University of Technology, Coimbatore and did his B.E degree in Electronics and Communication Engineering from Anna University Chennai. He received his Diploma in Electronics and Communication

Engineering from State Board of Technical Education, Chennai. Pursuing Ph.D under Anna University in the area of High Performance VLSI design. He is having 14 years of teaching experience.



Dr. A. Bharathi received her Doctoral degree in Information and Communication Engineering specializing in Data Mining. She received her Post Graduate Degree under Anna University and did her Bachelor's degree at Bharathiar University Bachelor's degree at Bharathiar University. She has over 18 years of teaching experience.



Retrieval Number: B7299129219/2019©BEIESP

DOI: 10.35940/ijitee.B7299.129219

Journal Website: www.ijitee.org

Dr. A. N. Jayanthi received her Ph.D degree in the Faculty of Information and Communication Engineering from Anna University. She received her M.E degree in VLSI Design from Anna University and her B.E degree in Electronics and Communication Engineering from

