

# Kulkarni Rashmi Manik, S Arulselvi, B Karthik

Abstract: Nowadays, all electronic gadgets and devices are in the race of most compact size and least power consumption. Along with it, we want device to do many jobs at a time. For that it needs to be laced with maximum computing power. CMP/Many core architectures are inn with this advancement. In CMP architectures, parallel task execution is done rather than executing task sequentially, as it is done in conventional programming. Network-on-chip (NoC) is the best approach for interconnecting complex CMP architectures. However, NoC consumes significant percentage of total power. Final objective is to design power efficient NoC. In CMP architecture, many data transfers are done simultaneously. It is required to evolve power saver NOC with unique task scheduling schemes. Our approach enhances network latency, network throughput along with energy reduction. It defines performance of the system with high speed transmission. It shows improvement in the system performance when experimented with various applications. Here, we concentrated on proper distribution of tasks and network traffic required for execution. One needs to balance between over and under utilization of resources. Analysis of both the cases is done in this article. Over utilization is injecting excess traffic in the network which leads to heat up problem. Under utilization is, PEs remaining idle for long time resulting in sluggish performance. Here we tried to list various innovative methods for saving network from reaching power wall. Energy modeling and granular traffic analysis gives us accurate estimation of device performance beforehand. Also, analysis is done for various topologies for different tasks.

Index Terms: CMP, CWC, CWN, Energy Consumption Reduction, Network Latency, NoC, Network Throughput.

## I. INTRODUCTION

The CMP architectures are for next generation devices. Present electronic world is full of smart electronic devices. However, processor architecture inside them is still single core at maximum places. Slowly, they are getting transformed towards many-core processor architectures. We are all heading towards an era of electronic gadgets which are intelligent beyond our imagination. Health monitoring systems, weather forecasting systems, satellites, IOTs, automobiles, logs in various systems such as simulators, traffic controllers and even personal gadgets gather lot of information daily. Everybody wants to maintain this information automatically. We want good analyzers to

#### Manuscript published on 30 March 2019.

\*Correspondence Author(s)

Kulkarni Rashmi Manik, Research Scholar/ECE, Bharath Institute of Higher Education and Research, Chennai, India.

S Arulselvi, Associate Professor, Department of ECE, Bharath Institute of Higher Education and Research, Chennai, India.

B Karthik, Associate Professor, Department of ECE, Bharath Institute of Higher Education and Research, Chennai, India.

© The Authors. Published by Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license http://creativecommons.org/licenses/by-nc-nd/4.0/

extract conclusions out of it. All these things need good CMP architecture inside smart and fast electronic systems. Many-core systems are fit for this type of data analysis, as it needs massive calculations. Today electronic systems are interactive, user friendly and we want them to keep continuous attention at many places. These type of requirements lead to changes in processor architecture from single-core to many-core. As number of cores increase, interconnect architecture is chosen as "Network-on-Chip (NOC)" rather than bus based architecture. In this research article, we are concentrating on the energy modeling of the NOC. Minute details of energy consumption are considered here. Best efforts are put for finding energy consumed for every bit switching, that too cycle by cycle. This helps in analyzing device performance in various conditions, accurately. Special attention is given to tackle heat up problem of CMP architecture. It is found that, if granular analysis of network traffic is done, we can predict pitfalls or hot-spots of NOC accurately. Performance of various topology is also given through graphical analysis.

The NoC resources transmit and receive data in packet format. Compared to bus based architecture, these resources are competent and they improve the system performance every block by block. [19][20]

## A. NoC Resources

NoC Resources that provide communication infrastructure are shown in Figure 1.



Fig.1 General NoC resources with network linking mechanism

# B. NoC Protocols

NoC protocol governs the communication procedure in the network. NoC is capable of transmitting / receiving information with an effective communication. It consumes less energy with on-chip interconnection networks [3]. Precaution can be taken so that the communication never fails in NoC. Many times delayed packets are accepted with little less performance of network. The store and forward (SAF) routing protocol is simple and efficient to implement a packet-switched network [5].

#### C. NoC Architecture

NoC architecture consists of links, routers and Network Interfaces(NI) with capability of providing high throughput, low latency and scalability [23][16]. Router can store and forward data. Data links among routers are as shown in Figure 2. NI provides interface between processor core and router in NoC.



Fig.2 Block Diagram of common 2-D Mesh NoC Architecture

The input and output buffers in router store data temporarily and boost up energy of it.

## D.NoC Evaluation

Evaluation phase is for categorizing the NoC architecture that has various metrics. Detail study includes Network modelling, Network simulation process, etc.. Network modelling is required to explore network size, buffers size, packets distribution, routing algorithm, data injection allotment, and node traffic dispersal [19]. Network simulation process contains analyzing network model for performance and energy consumption.

## II. LITERATURE SURVEY

The Reetuparna, et al., [1] presented the performance and energy enhancement through data compression in NoC. Daniel, et al., [2] presented an interconnect memory hierarchy to design large-scale CMPs. Hyungjun, et al., [3] presented the network energy consumption through performances to reduce linkage and router level substituting network for data transmission. Chen and Pinkston [4] proposed a node router decoupling for effective energy consumption for NoC routers. Stavros, et al., [5] presented the various core server chips that contrast in on-chip network traffic. Wang, et al., [6] proposed the run-time energy gating in caches for leakage energy savings. Ahmed and Slim Ben Saoud [7] presented a study on the design and NoC implementation tools for network throughput. Khaitan and McCalley [8] presented a hardware based approach to save the cache node energy systems. Mittal S., et al., [9] presented survey of architecture techniques for improving cache energy. Lee and Choi [10] proposed a technique for reducing energy consumed by hybrid caches in multi-core architectures based on dynamic partitioning of caches to maximize hits and minimize misses in the network. Lou, et al., [11] presented novel two level cache architecture for mitigating power overhead. Mittal et al., [12] proposed a multi-core cache energy saving system using dynamic cache reconfiguration. Tejasi and Sudeep [13] presented a novel scheduling technique for network and memory accesses to optimize overall system performance Kashwan and Selvaraj [14] have developed a routing algorithm to find an optimal path with low overhead. Nasirian and Bayoumi [15] have developed adaptive router for NoC, considering low latency and power efficiency using power gated network. Shenbagavalli and Karthikevan [16] developed a low-power NoC router architecture for mesh topology to optimize the path allocation process using hybrid schemes. Chien, et al., proposed energy efficient an non-volatile microprocessor considering software-hardware interaction for energy harvesting application. Edoardo and Alessandro [18] presented hybrid electronic/photonic, hybrid topology based on novel architecture to mitigate insertion loss and cross talk noise effects. Emmanuel, et al., [19] presented a study of energy saving methods for efficient NoC scheme with a focus on cache and router components, such as buffer and crossbars. Kiran and Kamma [20] presented an output buffer router to increase the throughput and lower the latency with minimum area and power overhead. Letian et al., [21] proposed transmission mechanism for NoC based on a new combination of error detection, error correction, and re-transmission. Sascha, et al., [22] presented a novel simulation approach for NoCs that allow to simulate communication delays equally accurate but much faster in average than on a flit-by-flit basis. Salma, et al., [23] presented a survey of NoC design for real-time applications which needs Quality-of-Service (QoS), adaption, and energy efficient techniques. Sebastian, et al., [24] presented a comprehensive appraisal on optical NoC architectures and about their strengths, weakness, and related active research areas. A lecture on digital electronics [25] tells components of power dissipation in CMOS. Farzan Fallah and Massoud Pedram, [26] talked about standby and active leakage current control and its minimization in CMOS VLSI circuits. K. Roy, et al., [27] talked about leakage current in sub-micrometer CMOS gates. David Harris [28] gave lecture on introduction to CMOS VLSI Design: Synthesis and Floor Planning. Gaizhen Yan, et al., [29] in experiment have shown that, under 8x8x4 network configuration, 3D bus NoC hybrid network incorporated with PDDVD, outperforms 3D Mesh NoC by at most 26.6% reduction in average network latency. Zhicheng Zhou, et al., [30] compared a mesh-based optical Network-on-Chip (ONOC) with the proposed hybrid optical-electronic Network-on-Chip (HOE-NOC) with reduced energy consumption by 10.2%. Zhang Ying, et al., [31] showed results of simulations and experiments on testing platform with ITC`02 benchmark circuits, also showed the effectiveness and flexibility of the collaborative testing and mapping optimization for NoC. Xinxin Yue, et al., [32] in experiment showed that, compared to E-Mesh with 64 nodes under random traffic pattern, HOG-NoC improved throughput by 25.7%, decreased latency by 75% and reduced energy consumption by 12.9%. Xintian Tong, et al., [33] proposed multi-mode router that can meet the purpose of low power design and can adapt to the background of dark silicon.

Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP) © Copyright: All rights reserved.



Rui Ben, et al., [34] tell, the experiment results show that compared with the existing multicast routing algorithm, the proposed 3D\_CPM and 3D\_OCPM column partition multicast routing algorithms both improve performance of system effectively. Guoming Nie, et al., [35] in their experiments showed that, compared to the network without QoS mechanism, the proposed network has excellent fairness guarantee under hot-spot traffic pattern. Gaizhen Yan, et al., [36] showed that with hybrid electro-optical Network-on-Chip, compared to electronic network, throughput has been improved by 39% and 57%, while the latency has been reduced by 63% and 51%.

## III. BASIC ENERGY MODELING

# A. Policies For Choosing CMP Architecture

For CMP architecture, to become successful, one needs to design following things very carefully,

- 1. Choice between bus based architecture and NOC
- 2. Appropriate processor architecture
- 3. Appropriate topology if NOC is chosen as interconnect architecture
- 4. Power consumption modeling of the circuit to tackle heat up problem

In this research article, we will concentrate on the power consumption modeling of the CMOS circuits, especially for CMP architectures. Computing power of CMOS device gets boosted with more number of processing elements; however its performance is limited by power wall. For 64 and above PEs, interconnect policy - NOC with hierarchical topology is found to be the best suited. If proper power consumption modeling is done along with it, we can expect a good, reliable device having ace computing power and compact size as a end result.

#### B. Portability of Device

A portable device should consume less power and should have good battery support. Obviously, longer battery life is another important parameter. Looking at the booming market of the portable devices, battery is important element in electronic devices. We want less frequent and quick recharges of the battery. All it needs is to conserve energy consumption to the best possible. Good quality battery support also can be given. Figure given below shows lithium-ion batteries are the best one. [25]



Fig.3 Battery Lifetime Vs Battery Capacity

## C. Basic Parameters for Energy Modeling

In this modeling, basic parameters of electronic device are considered. As CMP architecture and NOC are made up of CMOS logic devices and transistors, we need to go little deep inside CMOS VLSI technology. D. How to Keep Power Budget Low?

We need motivation towards reducing power consumption. For that, it is required to find where power is getting consumed during circuit operation. Once it is known, power model can be built with the help of optimization techniques.

## E. Where does power go in CMOS?

Power is consumed in CMOS circuits mainly, during switching activity. Every CMOS logic gate spends energy during changing its output state from zero to one and vice verse. More energy is consumed during state change from zero to one than opposite. Following points gives division of power consumed during circuit operation

1. Switching Power, i.e. charging and discharging of output capacitance. Formula given below is for energy spent per transition and it is dependent on  $C_L$ , i.e. load capacitance and  $V_{dd}$  i.e. supply voltage of CMOS circuit. Further, power is dependent on operating frequency, f and given by another formula. Figure 4 and 5 show switching activity of CMOS gate and power consumed during it.

Energy/Transition = 
$$C_L * V_{DD}^2$$
 (1)  
Power =  $(E_{DP} * T_{TP} * F_{TP} * F_{TP$ 

- 2. Short Circuit Power due to non zero switching rise/fall times of CMOS gates
- 3. Leakage Power, typically 0.1 to 0.5nA at room temperature



Fig.4 Switching activity of CMOS gate



Fig.5 Power consumed during switching activity

As one can see, in above figure 5, short circuit power ( $P_{SC}$ ) gets consumed during switch on and switch off time of every on-off transition of logic gate. Switching of the logic gates is highly dependent on data. We need to find out switching activity accurately. Node by node power dissipation analysis is required to generate accurate power consumption model. Simulation may take many days time. Accurate estimation is very important.  $V_{DD}$ ,  $f_{clk}$  and  $C_L$  are deciding factors. Interconnect capacitances are also deciding factor and it is strongly dependent on good layout.



Retrieval Number: E3296038519/19©BEIESP

#### F. Dynamic Power

It is given by formula,

$$P_{dynamic} = C_L * (V_{DD}^2 / 2) * f_{clk} * SW$$
(3)

Here SW is switching factor on a signal line. Power dissipation is data dependent function of switching activity. Switching factor SW is dependent on signal transition probability on signal line  $(P_{0\rightarrow 1} \& P_{1\leftarrow 0})$ .

Transition probabilities for all basic gates, logic devices can be calculated and used during power estimation. Glitch reduction by using balanced paths can be done.

#### G. Power-Delay Product

Power-Delay product is considered to achieve optimum balance between speed and power requirement of the circuit. It is given by formula,

$$t_{pLH} = \left(C_L * V_{DD}\right) / K_n \left(V_{DD} - V_{Tn}\right)^2 \tag{4}$$

 $t_{\text{pLH}}$  is delay for 0 to 1 transition. Similarly delay for 1 to 0 can be calculated.



Fig.6 Relation between power and delay

#### H. Leakage Current

There are four main sources of leakage current in a CMOS transistor [26]:

- 1. Reverse-biased junction leakage current (IREV)
- 2. Gate induced drain leakage (IGIDL)
- 3. Gate direct-tunneling leakage (IG)
- 4. Sub-threshold (weak inversion) leakage (ISUB)

Reduction in leakage current can be achieved by using both process and circuit level techniques. At process level, leakage reduction can be achieved by controlling the dimensions (length, oxide thickness, junction depth, etc.) and doping profile in transistor. At circuit level, several techniques to reduce leakage current have been proposed in various literatures. To reduce leakage currents, these techniques explore supply and threshold voltage leakage dependence, as well as the concepts of stacking effect and body biasing [27] used.

#### I. Gate Leakage

Due to shrinking size of CMOS devices, in the gate of CMOS transistors, electrons tunnel through, which forms gate leakage current. It needs to use proper IC fabrication techniques such as SOI to reduce gate leakage.[27]

# J. Compact Power Budget

Energy modeling is nothing but reducing power expenditure maximum possible. It is done in various ways as described below.

## ➤ Supply Voltage

Prime choice of power budget reduction is reducing supply voltage. However, we need to consider power delay product below certain limit of supply voltage.

## ➤ Reduce Switching Activity

This research article we have concentrated on studies of reducing switching activity by various ways. Following precautionary majors can help NOC in making CMP architecture energy efficient. Mega-core architectures are prone to heat up problem. By reducing switching activity, heat up problem can be tackled.

## ✓ Logic Synthesis:-

Logic Synthesis helps in generating net list.[28]

## ✓ Clock Gating:-

Instead of circulating directly, clock is inverted or ended with another signal to get superseding control. Inversion is done so that adjacent subsystems do not change its state at the same time.

# ✓ Number of Transitions per cycle:-

Average number of transitions of NoC router per cycle can be monitored.

#### √ Avoid Glitching

Switching activity is highly data dependent. Glitches should be avoided. Balanced paths are preferred than big chain of gates to reduce number of glitches. Big portion of energy gets consumed due to glitching.

## √ Reduce working frequency

Lowering working frequency is another solution. However, it reduces just average energy spent. Not the total energy spent and due to reducing working frequency, throughput worsens.

#### ✓ Parallelism

Using parallelism is another way to reduce switching activity. But disadvantage is area increases by 3 to 4 times and extra routing is required for parallel circuits.

## ✓ Pipe-lining

Pipe-lining can be done to improve speed of the operations.

#### √ Transistor size

Reducing transistor size with delay adjustment is done some times. Big size transistors speed up operation but consume lot of power.

#### √ Slack Time

Adjustment of slack time gives good results. Slack time is difference between required time and arrival time of a signal at the gate input. Positive slack time should be kept minimum and negative slack time should be kept maximum. Due to this reason, always active low signals are preferred. Keeping minimum size of gates which toggle more frequently can conserve energy.

## ✓ Late introduction of highly switching signals

Postponing introduction of high clocking signals to the later stages of circuit saves power by reducing number times switching of logic gates.





## √ State Encoding

State encoding has big impact on switching activity. Number of state reduction must be done by applying various techniques to reduce power consumption.

## √ Bus Encoding, Bus Inversion

Bus encoding is done to reduce number of bit that toggles on the bus. One way is bus inversion coding. Here one extra bus line called invert is circulated. If number of transitions is less than 50%, bus invert signal remains zero and signals are transmitted as it is. If number of transitions is more than 50%, bus invert signal is made 1 and inverted signals are transmitted on the bus. Low weight coding is one more way of bus encoding. It uses transition signaling instead level signaling.



Fig.7 Phased Clocking

# ➤ Phased clocking

One more innovative way is phased clocking. Many times total power consumption is not a matter of worry, but at a time increase in number of activities leads circuit to reach power wall. Power wall is dependent on the current drawn by circuit at every clock cycle. If total circuit we divide into number of phases, peak current drawn by circuit reduces by number of clock phases times.

 $I_{peak} = I_{max}/Number of Clock Phases$ 

## ➤ Task Spacing

Increase in task spacing among clusters can tackle heat up or power wall reaching problem.



Fig.8 Task Spacing



Fig.9 Alternate Cluster Activation

#### ➤ Dark Silicon Areas

PEs which is going to remain idle for longer time can go in dark (power off/tristate). This activity should be restricted for longer idle gaps only as energy is required to bring PEs from idle to active state.

## ➤ Intelligent Task Allotment Scheme

Intelligent task allotment saves number of hops of data packets. Many policies can be adopted as intelligent tasking. More frequent communicating PEs should be adjacent PEs. One should keep minimum communication between PEs in one corner of NOC to another corner. Likewise many innovative ways such as task priorities, parallel tasking, summing tasks, prediction of task result, etc can be considered.

## ➤ Reduce Physical Capacitance

This can be done by drawing proper layout and device sizing

## IV. ESSENTIALS OF NOC

# A. The Basic Components of NoC

NoC interconnects many processor cores. It is a scalable architecture. Compactness of NoC leads to energy efficiency and low latencies. Essential components of NoC are shown in Figure 10.

The network topology gives stability to NoC as typical interconnect pattern is repeated again and again. It ease-outs data packaging / protocol formatting, end-to-end services, task distribution and scheduling. An appropriate switching scheme saves power. Depending on device requirements, processors nodes, switch nodes /routers are included in NoC. Queuing of data packets helps in reducing latency. Appropriate protocol formatting is important in heavy network traffic as it reduces energy consumption. Data flow control mechanism takes control on network traffic. It decides how many tasks should interact with each other at a time. It also decides number of interfaces between tasks over a complete execution time.





Fig.10 Basic Components of NoC

#### B. NoC Router

NoC router allows interconnecting multi-core systems with scalability and flexibility [8]. The effectiveness of NoC can be reached with appropriate network throughput, network latency as well as overall system performance [15]. NoC routing transfers data to end-point with different schemes which are classified as, source NoC routing, deterministic routing and Path substituting routing.

#### ✓ Source NoC Routing

The data transmitter provides the route at each node.

#### ✓ Deterministic Routing

The data transfer attained through deterministic network paths that are adaptive to routing systems to generate improved network.

## ✓ Path Substituting Routing Network

It transmits data for a particular period of time and the data may take diverse routes to reach at destination.

In the block diagram of NoC router, the arbiter obtains data packets from input buffers and distributes them to virtual channels as shown in Figure 11. The crossbar switch establishes link between the input buffers and output ports. Data packets are transmitted to the subsequent linked router.



Fig.11 Block Diagram of NoC Router

The data arbitration is controlled by NoC router. It maintains source-destination information for every packet and data transfer requests for the same with neighboring routers. The packets with the same priority and destination are properly sequenced by router.

## C. Packet data Format for NoC

Data is circulated in NoC with packet format. Each packet contains header, source, destination, packet size, etc and actual data bytes. A common packet format is shown in Figure 12.



Fig.12 Packet format for the NoC

#### D.Enhanced Data Transmission in NoC

In network, transmission of data with optimization conserves energy and it improves network quality [17]. The energy consumption in data transmission from source to destination depends on number of hops taken by the data packet. The design of an intelligent multi-hop navigation sequence [8] [13] is shown in Figure 13. Bypassing routers saves energy. Figure 13 shows that if there is straight path in data transmission, at a time, in one clock cycle, number of routers coming in the path can be skipped.



Fig.13 Design of a multi-hop sequence with routers transmission in NoC

However, the number of routers that can be bypassed depends on clock frequency of NoC. We need to achieve golden mean between longer time period of clock cycle (in turn, number of routers that can be skipped in one cycle) and actual speed of the data transmission.

As shown in figure 13, because of the multiplexer cum boosters placed in the path, NoC does not face the energy wise challenges of long distance traversal of data. Also, bus inversion helps in reducing switching activity on signal line.

# V. NOC PERFORMANCE AND LATENCIES

For varying data traffic and number of processor cores, the network performance gets significantly affected. The size of CMP influences the average network latency significantly. Figure 14 shows average network latency (in cycles) Vs number of cores. Traffic congestion can be avoided by selecting appropriate number of cores. 16-core CMP architecture saves approximately 56-cycles of latency/flit on average and creates congestion-free network for all types of workloads.





Fig.14 Average network latency (in cycles) Vs Number of cores

## A. Latency Model

Latency model can be built by identifying and analyzing each thread in the software. Precise identification of every interface between threads establish correct number of channels between them. We need to consider bandwidth available and regulate the network traffic for every channel. Further, latency can be estimated for every packet travelling through channel. Waiting times for data packet can be calculated with departure and arrival time. Frequency of operation has significant impact on latency. For excess load, packets are queued and that waiting time can be added to the latency. Figure 15 shows sample of latency model.



Fig.15 Channel in Latency Model

Source, destination and data packet have unique IDs for building latency model. M shows mean time for complete transfer of a data packet.

B. Components of Network Latency

Equation [6] shows components of latency.

$$L^{S \to D} = L_{hf}^{S \to D} + L_{ff}$$
 (6)

Where,  $L_{hf}^{5L}$  shows network latency of header flit and  $L_{ff}$  shows the latency of flit frame.  $L^{S \to D}$  is average latency for every packet. It is possible to calculate latency rapidly for enhanced data transmission with splitting latency in components as above.  $L_{hf}$  can be determined from maximum of the control delay (c) and lead delay (l) as in equation (7).  $L_{ff}$  is the sum of the control delay and lead delay and is given in eqn. (8). For  $L_{ff}$ , a data cycle time of input and output buffers inside the router are measured.

$$L_{hf} = (mean - 1) \times \max(t_c, t_l)$$

$$L_{ff} = (mean - 1) (t_c + t_l)$$
(8)

The overhead of queuing has to be added in above calculations.

# VI. ENERGY CONSUMPTION MODULE FOR NOC ROUTING

Energy consumption module for NoC routing includes modelling of network phases and node linkage

#### A. Network Phases

A standardized NoC has uniform architecture and data path lengths. It considers network serving capacity and augments with predictable energy models of Network traffic, CMPs and Cache coherency.

Many times, NoC design is based on cache-coherent memory [19][20][16]. It addresses the task scheduling and performance of complete system [10][11]. It gives scheduling techniques suitable to the memory controller. The memory controller works in two phases as:-

*Network Phase-1:*- In phase I, task distribution is done. The network allocates the load at the commencement of every predefined interval with the help of a counter. The packet request is done by the cache to off-chip memory.

*Network Phase-2:-* This phase is for task execution and in this phase NoC is free to utilize parallelism and maximize efficiency.

## B. Node Linkage Energy Overhead

NoC have processor attached to each router. Processors initiate data transfers. Energy is consumed in initiating data transmission. This is additional overhead to energy model of NoC. This work estimate energy efficiency at leaf node level router in NoC.

# VII. COMPUTATIONAL ANALYSIS OF ENERGY CONSUMPTION IN NOC

The computational analysis of energy consumption in NoC uses replications. Energy modeling is done for only one segment. Same is used to calculate total energy consumption. This can be done without simulation. It saves simulation time. This computational model uses straight forward calculation methods and can be extended to other topology very easily.

A typical energy consumed by a single flit in a data packet during path traversal is measured as  $E_{flit}(T)$  in the network and is given as in eqn. (9),

$$E_{flit}(T) = T \times E_{linkage} + (T+1) \times E_{router}$$
 (9)

Where,  $E_{linkage}$  and  $E_{router}$  are the parts of total energy for a node linkage and a network router respectively. T is given as hops traversed by a packet in the network path.  $E_{total}$  gives a total energy consumed by  $N_{packet}$ , number of packets, having  $N_{flit}$ , number of flits as given in eqn. (10),

$$E_{total} = \sum_{i=1}^{N_{packet}} \sum_{j=i}^{N_{flit}} E_{ij flit} (T)$$
(10)

Where,  $E_{ij flit}(T)$  is energy consumed by j<sup>th</sup> flit of i<sup>th</sup> packet. This computational model is good for quick energy estimation of NoC.



A. Energy Model for Regular and Arbitrary Complex Network

The network fragments replicate energy consumption pattern. Energy required to transmit a bit with average number of hops H is calculated from bit energy per node hop,  $E_{nodehop}$ , as given in eqn. (11),

$$E_{bit} = H \times E_{nodehop} = H \times (E_{linkage} + E_{router})$$
 (11)

The energy model guarantees accuracy to the extent of its statistic formulas. Additionally, to find out the number of hops H, it necessitates the acquaintance on the consistent adherence to communication between source node and destination node. It is determined by the communication type and the traffic pattern. The dissipation of energy is customized by system design. Energy model includes analysis for low latency communication control. The network protocol reduces the probability of collision by transmitting a small sample data of size  $P_{re}$  and checks for collisions before transmission. Energy for transmitting this test sample - pre data packet is given in eqn. (12),

$$E_{fragment} = E_{bit} \times \left(1 + N_{retrans} \times P_r e_P\right) \tag{12}$$

Where,  $E_{bit}$  is the energy/bit for a transmission,  $N_{retrans}$  is the standard number of re-transmissions per collision and P is the typical packet size.

The alternative NoC patterns which are used in network at the edges are considered as a symbol transfer overhead, thus the energy in symbol transfer is as given in eqn. (13),

$$E_{fragment}^{edge} = E_{bit} \times \left(1 + H_{symb} \times P_{symb} / P\right)$$
 (13)

Where,  $P_{symb}$  is the length of the symbol and  $H_{symb}$  is the average number of hops taken in transmission of complete message in arbitrary network traffic.

In this model, in a typical case,  $E_{bit}$  evaluation considers the network communication which is regular and irregular or arbitrary. It regulates the node frequency in spite of the non resolved destinations in arbitrary network. Hence, the energy of combined communication is given in eqn. (14),

$$E_{bit} = \left[E_{Txregular} + E_{Txarbitrary}\right] + \left(N_{TxRxactive} - 1\right) \times \left[E_{Rxregular} + E_{Rxarbitrary}\right] \quad (14)$$

where,  $E_{Txregular}$  and  $E_{Rxregular}$  are the consumption of energy as a result of the transmitting and receiving data in regular network,  $N_{TxRxactive}$  is the number of active transceivers. If we focus on the arbitrary network side, it is found that transceivers consume an additional energy called  $E_{Txarbitrary}$  and  $E_{Rxarbitrary}$ . Above mentioned model which is considered in this work, is implemented.

## VIII. RESULTS AND DISCUSSIONS

In this research work, the results are demonstrated and experimented in scilab. Graphs show comparison of network latency, throughput and energy consumption in different workload. Different applications and different network schemes are considered while drawing graphs. Network schemes considered are CWC (CMP with cache consistency), CWN (CMP with NoC) and proposed EDTWN (Enhanced Data Transmission with NoC) and applications considered are mgrid, apsi, zeus and tpcw. Graphs given below are outcome of experiments.

# A. Network Latency Based Performance Evaluation

The latency of data packets depends on number of flits in a packet. However, the packets in various applications (such as, mgrid, apsi, zeus, tpcw) have different number of flits per packet. The CWC has a overhead for cache access per cycle, whereas the CWN shows improvement for same applications. In this work, the simulation result shows better performance of proposed method, in comparison with CWC and CWN in terms of network latency (flit/cycle) verses load is shown in Figure 16.



Fig.16 Network Latency Vs Load

The impact on network latency (flits/cycle) due to transmission load time is measured. Transmission load time is time, until source receives the acknowledgement from destination through the network hierarchy. Acknowledgement is nothing but the clear to send (CTS) signal. Transmission load time is proportional to network traffic. In proposed method, network traffic is regulated through enhanced data transmission architecture. Graph represents the load time and network latency in various applications as shown in Figure 17. The proposed networking method is having reduced transmission load time and improved network latency.



Fig.17 Network Latency Vs Standardized Transmission Load Time

## B. Network Throughput based Performance Evaluation

The comparison of network throughput rate and throughput with various application and specific networking techniques is given in graph below. Proposed method has enhanced efficiency of the network throughput in comparison with CWC and CWN. Proposed method shows good performance for the travel of network packets as well as memory packets in given constraints are shown in Figure 18.







Fig.18 Aggregate Throughput Rate Vs Network Throughput

The efficiency of network transmission throughput time is measured in bits-routine, which is the time between issuing the request from the source node and receiving the acknowledgement from destination node. Graph shown below represents the transmission throughput time for various applications versus aggregate throughput rate, as shown in Figure 19.



Fig.19 Aggregate Throughput Rate Vs Network Transmission Throughput time

In the experimental work shown in Figure 16 to Figure 19, the *y*-axis and *x*-axis represents different parameters with various applications to comprehend the output performance as shown.

Steady network traffic analysis is done and shown in Figure 20 and 21.

## C. Energy Consumption based Performance Evaluation

The comparison between the CWC, CWN and proposed EDTWN indicates reduction in the energy consumption as shown in Figure 22 and Figure 23. The overall energy consumption is enhanced due to reduction in network latency and buffer space. The energy reduction is achieved without affecting data transmission performance. As injection rate is increased the switching activity in NoC increases. However in EDTWN, overall energy consumption is less.

Retrieval Number: E3296038519/19©BEIESP

Journal Website: www.ijitee.org



Fig.20 Throughput Vs Injection Rate under constant traffic



Fig.21 Throughput Vs Injection Rate under transpose traffic



Fig.22 Energy Consumption Vs Transmission Memory Workload



Fig.23 Energy Consumption Vs Injection Rate

## D. Topology Based Performance Evaluation

The topology based performance evaluation is shown in figure 24, figure 27 (based on equation 15, convolution), figure 25, figure 28 (based on equation 16, summation of products) and figure 26, figure 29 (based on n x n matrix multiplication).

360

Proposed enhanced data transmission based topology (EDTWN) is clearly seen as energy efficient among all other topology.

$$RES = \sum_{i=1}^{n} x(i) \times x(i-\tau)$$

$$RES = \sum_{i=1}^{n} [w(i) \times x(i)] + [y(i) \times z(i)]$$
(16)



Fig.24 Performance of various topology (Hops) - 1



Fig.25 Performance of various topology (Hops) - 2



Fig.26 Performance of various topology (Hops) - 3

Convolution X:- i Vs Y:- Path Length (mm)



Fig.27 Performance of various topology (Path Length) - 1

Retrieval Number: E3296038519/19©BEIESP Journal Website: <a href="www.ijitee.org">www.ijitee.org</a>



Fig. 28. Performance of various topology (Path Length) - 2



Fig.29 Performance of various topology (Path Length)- 3

# IX. CONCLUSION

In this research work, enhanced data transmission for NoC is studied and analyzed. Proposed architecture achieves end-to-end reduction of latency and power. Factors affecting energy consumption are explored along with granular traffic analysis.

Finally, one can say power dissipation is already a prime factor of concern in CMP architectures. Optimization at all levels of abstraction and efficient design is required.

#### REFERENCES

- Reetuparna Das, Asit K. Mishra, Chrysostomos Nicopoulos, Dongkook Park, "Performance and Power Optimization through Data Compression in Network-on-Chip Architectures," IEEE Transactions, pp. 215-225, 2008.
- Daniel Sanchez, George Michelogiannakis, Christos Kozyrakis, "An analysis of on-chip interconnection networks for large-scale chip multiprocessors," Journal of ACM Transactions on Architecture and Code Optimization, Vol. 7, No. 1, April 2010.
- Hyungjun Kim, Pritha Ghoshal, Boris Groty Paul V. Gratz Daniel A. Jiménez, "Reducing Network-on-Chip Energy Consumption through Spatial Locality Speculation," Pittsburgh, USA, pp. 1-8, May 1-4, 2011
- L. Chen, T.M. Pinkston, "Node-Router Decoupling for Effective Power-gating of On-Chip Routers," in Proceedings of the 45th Annual IEEE/ACM International Symposium on Micro-architecture, Vancouver, BC, Canada, pp. 270–281, December 1–5, 2012.



Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP) © Copyright: All rights reserved.

361



- Stavros Volos, Ciprian Seiculescu, Boris Grot, Naser Khosro Pour, Babak Falsafi, and Giovanni De Micheli, "CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers," in Proceedings of the 6th International Symposium on Networks-on-Chip, IEEE Conference, pp. 1-8, 2012.
- Y. Wang, S. Roy, N. Ranganathan, "Run-time power-gating in caches of GPUs for leakage energy savings," in Proceedings of the Design, Automation Test in Europe Conference Exhibition, Dresden, Germany, pp. 300-303, March 12-16, 2012.
- Ahmed Ben Achballah, Slim Ben Saoud, "A Survey of Network-On-Chip Tools," International Journal of Advanced Computer Science and Applications, Vol. 4, No. 9, pp. 61-67, 2013.
- S.K. Khaitan, J.D. McCalley, "A hardware-based approach for saving cache energy in multi-core simulation of power systems," in Proceedings of the IEEE Power Energy Society General Meeting, Vancouver, BC, Canada, pp. 1-5, July 21-25, 2013.
- S. Mittal, "A Survey of Architectural Techniques for Improving Cache Power Efficiency," Computational Information System, Vol. 4, pp. 43– 48, 2013.
- D. Lee, K. Choi, "Energy-efficient partitioning of hybrid caches in multi-core architecture," in Proceedings of the 22<sup>nd</sup> International Conference on Very Large Scale Integration, Playa del Carmen, Mexico, pp. 1-6, October 6-8, 2014.
- 11. M. Lou, L. Wu, S. Shi, P. Lu, "An energy-efficient two-level cache architecture for chip multiprocessors," in Proceedings of the 5th International Conference on Computing, Communications Networking Technologies, Hefei, China, pp. 1-5, July 11-13, 2014.
- S. Mittal, Y. Cao, Z. Zhang, "A Multicore Cache Energy-Saving Technique Using Dynamic Cache Reconfiguration," IEEE Transactions in Very Large Scale Integration System, Vol. 22, pp. 1653–1665, 2014.
- 13. Tejasi Pimpalkhute, Sudeep Pasricha, "NoC Scheduling for Improved Application-Aware and Memory-Aware Transfers in Multi-Core Systems," 27th International Conference on VLSI Design, IEEE Computer Society, pp. 239-240, 2014.
- 14. K.R. Kashwan, G. Selvaraj, "Implementation and performance analyses of a novel optimized NoC router," International Conference on Convergence of Technology, IEEE Xplore, Pune, India, April 6-8,
- 15. N. Nasirian, M. Bayoumi, "Low-latency power-efficient adaptive router design for network-on-chip," in Proceedings of the 28th IEEE International System-on-Chip Conference, Kohala Coast, HI, USA, pp. 287-291, August 27-29, 2015.
- 16. S. Shenbagavalli, S. Karthikeyan, "An efficient low power NoC router architecture design," in Proceedings of the Online International Conference on Green Engineering and Technologies, Coimbatore, India, pp. 1-8, November 2015.
- 17. T.K. Chien, L.Y. Chiou, C.C. Lee, Y.C. Chuang, S.H. Ke, S.S. Sheu, H.Y. Li, P.H. Wang, T.K. Ku, M.J. Tsai, "An energy-efficient nonvolatile microprocessor considering software-hardware interaction for energy harvesting applications," in Proceedings of the International Symposium on VLSI Design, Automation and Test, Hsinchu, Taiwan, pp. 1-4, April 25-27, 2016.
- 18. Edoardo Fusella, Alessandro Cilardo, "A Hybrid Optical-Electronic NoC based on Hybrid Topology," IEEE Transactions on Very Large Scale Integration Systems, Vol. 25, No. 1, pp. 330-343, January 2017.
- Emmanuel Ofori-Attah, Washington Bhebhe and Michael Opoku Agyeman, "Architectural Techniques for Improving the Power Consumption of NoC-Based CMPs: A Case Study of Cache and Network Layer," Journal of Low Power Electronics and Applications, Vol. 7, No.14, pp. 1-24, 2017.
- 20. Kiran, Kamma Solanki, "Design of efficient NOC router for chip multiprocessor," Invention Computation Technologies International Conference on IEEE Xplore, Coimbatore, 26 January 2017.
- 21. Letian Huang, Xinxin Lin, Junshi Wang, Qiang Li, "A low latency fault tolerant transmission mechanism for Network-on-Chip", Circuits and Systems, IEEE International Symposium on Baltimore, USA, pp. 28-31 May 2017.
- 22. Sascha Roloff, Frank Hannig, Jurgen Teich, "High performance network-on-chip simulation by interval-based timing predictions," in Proceedings of the 15th IEEE/ACM Symposium on Embedded Systems for Real-Time Multimedia, Seoul, Republic of Korea, pp. 2-11, October 15, 2017.
- 23. Salma Hesham, Jens Rettkowski, Diana Goehringer, Mh. A. Abd El Ghany, "Survey on Real-Time Networks-on-Chip," IEEE Transactions on Parallel and Distributed Systems, Vol. 28, No. 5, pp. 1500-1517, May 2017.
- Sebastian Werner, Javier Navaridas, Mikel Lujan, "A Survey on Optical Network-on-Chip Architectures," ACM Computing Surveys, Vol. 50, No. 6, January 2018.

- 25. "Power Dissipation in CMOS", Digital Electronic-SCRIBD, Lecture 13, 18-322 Fall 2003.
- 26. Farzan Fallah, Massoud Pedram, "Standby and active leakage current control and minimization in CMOS VLSI circuits", IEICE Transactions on Electronics, 2005.
- K.Roy, S. Mukhopadhay, H.Mahmoodi-Meimand, "Leakage current in sub-micrometer CMOS gates", IEEE Transaction Vol. No. 91, Issue No. 2, Feb 2003.
- 28. David Harris, "Introduction to CMOS VLSI Design: Synthesis and Floor Planning", Harvey Mudd College, Stanford University, Lecture
- 29. Gaizhen Yan, Ning Wu, Lei Zhou and Fen Ge, "PDDVB: A Priority Division Distributed Vertical Bus for 3D Bus - NoC Hybrid Network." IAENG International Journal of Computer Science, 43:2, IJCS\_43\_2\_14.
- 30. Zhicheng Zhou, Ning Wu and Gaizhen Yan, "Topology Optimization of 3D Hybrid Optical-Electronic Network-On-Chip.", Proceedings of the World Congress on Engineering and Computer Science 2016, Vol I WCECS 2016, October 19-21, 2016, San Francisco, U.S.A.
- 31. Zhang Ying, Chen Xin and Ge Fen, "Collaborative Optimization of Testing and Mapping for Network-on-Chip." Proceedings of the World Congress on Engineering 2018 Vol I WCE 2018, July 4-6, 2018, London, U.K.
- 32. Xinxin Yue, Fen Ge, Ning Wu, Gaizhen Yan, "HOG-NoC: Hybrid Optical-Electronic Mesh Based Grouped NoC." Proceedings of the World Congress on Engineering and Computer Science 2017 Vol I WCECS 2017, October 25-27, 2017, San Francisco, U.S.A.
- Xintian Tong, Fen Ge, Rongrong Zhou, Ning Wu, Fang Zhou and Yingying Kong, "Desing of Low Power Multi-Mode Router for Network-on-Chip in Dark Silicon Era.", Proceedings of the World Congress on Engineering 2017 Vol I WCE 2017, July 5-7, 2017, London, U.K.
- 34. Rui Ben, Fen Ge, Xintian Tong, Ning Wu, Ying Zhang, Fang Zhou, " A Multicast Routing Algorithm for 3D Network-on-Chip in Chip Multi Processors", Proceedings of the World Congress on Engineering 2018 Vol I WCE 2018, July4-6, 2018, London, U.K.
- 35. Guoming Nie, Ning Wu, Fen Ge, Gaizhen Yan, "A QoS-Enabled Optical-Electronic Network-on-Chip", Proceedings of the World Congress on Engineering and Computer Science 2017 Vol I WCECS 2017, October 25-27, 2017, San Francisco, U.S.A.
- 36. Gaizhen Yan, Ning Wu, Zhicheng Zhou, "A Novel Non-Cluster Based Architecture of Hybrid Electro-Optical Network-on-Chip.", This work is supported in part by the National Science Foundation of China (61376025), The Anhui Scientific Research Funds for University (KJ2017A501), The Jiangsu Innovation Program for Graduate Education (KYLX15-0283) and The Natural Science Foundation of Jiangsu Province (BK 20160806).
- 37. G. Ramprabu, T. Saravanan, G. Saritha, "Wireless Audio Signal Communication using Li-Fi Technology", International Journal of Engineering and Advanced Technology, Vol.8, Issue 2, 2018, pp.208-

