

# A Novel NOR-Type TCAM Deploy Dual-V<sub>T</sub> cell with OR-Type Cascade Match-Line Structure

## Check for updates

#### Rahul Nigam, Santosh Pawar

Abstract: We look over improvements in the schemes of large size content addressable memory (CAM). A CAM is a very important device that executes the routing table function within a single clock cycle in network router to transmit information over the network. CAMs are particularly popular in network switches to classify and sending information packets, they are also helpful in other different applications that require fast information retrieval from routing table. The primary CAM configuration challenge is to decrease power dissipation related with the lot of parallel activity in memory circuitry during search operation. As innovation going on in technology scaling, it continues minimizing the dynamic power dissipation of CAMs, however it also rises the leakage current of transistors. Thus, the static power is turning into a noteworthy bit of the whole power dissipation in CAMs. Here, we introduced a procedure which advantageous for high capacity Ternary Content Addressable Memory (TCAM) that minimize the static power dissipation in SRAM storage cell part and speed up activity in searching part of TCAM cell. We also divide whole memory into equivalent segments which improve performance of our design. We examine the different schemes and introduced the trade-offs of applying the techniques. Simulation and design have done by using Tanned EDA V.16 tool. For recreations of Low power TCAM structures we utilized predictive technology model (PTM) 45nm for high performance (HP) and low power (LP), which incorporate metal gate, high-k and stress effect of CMOS technology.

Keywords: Dual-V<sub>T</sub>, High capacity, Low power, OR-type Match-line, TCAM.

#### I. INTRODUCTION

The Content Addressable Memory (CAM) is a special kind of memory which have both storage cell and the comparison logic cell. CAM permits us to enter a search "word" (IP address in switch) and search the whole memory in a single cycle return one or more matches. In view of its parallel operation, CAM is a lot quicker than other hardware circuit available. Binary CAM stores two states "1" and "0". CAM performs well job in network router for quick exact match search, however isn't appropriate to prefix search. The

Revised Manuscript Received on August 30, 2020.

\* Correspondence Author

**Rahul Nigam\***, Electronics and Communication Engineering Department, Dr. A.P.J Abdul Kalam University, Indore, India. Email: rahul.nigam02@gmail.com

Santosh Pawar, Principal - School of Engineering and Dean-R & D, Dr. A.P.J. Abdul Kalam University, Indore, India. Email: spawarrkdf@gmail.com

© The Authors. Published by Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

"Ternary" in TCAM mention to the additional third state which can exist in a TCAM cell. TCAM is an increasingly adaptable kind of CAM which permits us to store mask bits. But, TCAMs require double the quantity of logic gates to deal with this. However, due to their limitation of low memory size and high-power dissipation, CAM and TCAM both have difficult to perform operation in particular applications, for example internet switches, where they require speedy search into and forward IP packets according to destination address in routing table. Numerous works have been done in both architecture level and circuit level of TCAM to speed up or to minimize the power dissipation. In first L. Chisvin and R. J. Duckworth [1] are clarified about CAM and associative memory systems. In this paper it is indicated that these structures are industrially achievable. K. E. Grosspietsch [2] described the useful structure of a CAM and its acknowledgment at the transistor level. In this paper associative processor systems are talked about and application explicit CAM models to help artificial intelligence highlights are overviewed. A. Sheikholeslami and I. Arsovski [3] portrayed a match-line detecting scheme that decrease power dissipation in CAM by effectively assign low power to match-lines have more mismatches. I. Arsovski and A. Sheikholeslami [4] gain introduced a match-line detecting plan that assign less power to searches including a higher number of mismatch bits. Since most of CAM words are mismatched, this scheme brings about a critical reduction in CAM power dissipation. A. Sheikholeslami and K. Pagiamtzis [5] reviewed latest improvements in the structure of high capacity CAMs. They described that CAMs are particularly application in network switches for packet classify and forward, but these devices are also useful in other applications that need quick search in table. At the circuit level, they also review low power match-line detecting strategies and search-line driving schemes. J. Zhang et al. [6] introduced another mismatch dependent power reduction procedure for CAMs. In this scheme the word circuits quickly self-disable the charging paths, if it realizes a mismatch. Since most of CAMs words are mismatched, a critical power is decreased with a high search speed. N. Mohan and M. Sachdev [7] introduced two types of ternary storage cells designs that utilize special features of TCAMs for minimize the cell leakage. A. T. Do et al. [8] announced multi-V<sub>T</sub> transistors are utilized in the CAM cell to minimize the leakage current and improve the match-line discharge speed contrasted with the standard-V<sub>T</sub> cells. J. Zhang et al. [9] proposed a novel OR-type cascade match-line, which sequentially interfaces the OR-type match-line segments that provide high search speed and low power.

Retrieval Number: J73760891020/2020©BEIESP DOI: 10.35940/ijitee.J7376.0891020

Journal Website: <u>www.ijitee.org</u>

In our design we first examine basics of CAM with RAM. Later in this paper we described benefits of OR-type match-line structure applied in TCAM design. In this design we disclose how to decrease power and speed up operation by feed dual- $V_T$  cell in place of regular TCAM cells. In next section, we proposed novel OR-type cascade match-line with dual- $V_T$  cell TCAM scheme. In end part we revealed performance result and discuss comparisons of proposed structure with prior schemes.

#### II. RAM VS CAM

Random Access Memory (RAM) is a type of primary memory utilized to read and write information. It is the part of a system utilized so as to store running applications (programs) and program's information for performing activity. It is basically of two kinds: Static RAM (SRAM) and Dynamic RAM (DRAM). CAM an exceptional kind of RAM which joins both storage cell and comparison cell. A CAM take a search word as input, which we want to search and return the matched memory location as output. This makes CAM significantly more complex and costly than standard RAM. CAM performs 'accurate' matches of search words and exceeds expectations at performing searches, similar to IP-to-MAC and MAC-to-Interface search tables. TCAM is presently become a basic segment of a cutting-edge network switch. It is an amazing and quick equipment for IP Prefixes. However, it is also complex, costly and power hungry.

#### III. TCAM OPERATION

Fig. 1 shows a Basic useful outline of a CAM. As appeared in figure input to CAM is the search word, which we need to look in memory that is feed onto the search lines. Each word has a corresponding match-line that demonstrates whether the required word and stored word are same or not. The match-line output is taken care of to an encoder that produces a paired match location relating to the match-line that is in the match state. An encoder is utilized in systems where just a single match is predicted. In CAM applications where more than single word may match, a priority encoder is utilized rather than a normal encoder.



Fig. 1 Operational block diagram of a CAM

To elaborate the working of TCAM in network switch we take a case of 4-word TCAM in which every word has 4 bits. We take 4-bit pseudo IP addresses in this example. We will imagine we are looking to search IP address 1011 for directing.

Table- I: Example of stored address data in TCAM

| Sr.<br>No. | IP Address /<br>Stored Word | Encoder<br>Output | Output<br>Port |
|------------|-----------------------------|-------------------|----------------|
| 1          | 101X                        | 00                | A              |
| 2          | 111X                        | 01                | В              |
| 3          | 10XX                        | 10                | С              |
| 4          | 1001                        | 11                | D              |

First step we charge all the match-line with "high" voltage before start of search. Then search-line driver encodes the destination word address onto the lines. If in a word a single bit mismatch it will pull down the whole match-line and the voltage becomes low. If don't care bit stored in the cell it will always show match for that bit, i.e. it will not pull-down match-lines. The sense amplifier shows a "match" when all the bits in a word are match with search bits, i.e. it detects a high voltage on match-line. The network switch scans routing table for the target address of every approaching packets and chooses the suitable output port to transmit packet towards destination. In this way, by combination of TCAM and encoder network packet transmit towards destination. As per the content of Table-I, the 4-memory bits are stored into TCAM and some bits stored as "X" as don't care about matching them. After operation two rows no. as 1 and 3 matched with the target address, so with the help of priority encoder select the first row and output the match location encoder as 00, Because it matches the greatest number of bits, i.e. much direct route towards destination.

#### IV. NOR-TYPE TCAM CELL

A standard 16T NOR-type TCAM cell is appeared in Fig. 2 has two standard 6T SRAM cells to store "Ternary" information bit and a comparison cell (N1-N4) to compare input search data bit with stored data bit.



Fig. 2 Circuit schematic of 16T NOR-type TCAM cell

Thus, this cell performs READ and WRITE activities like a SRAM cell. Transistors N1-N4 execute the XNOR logic operation to compare the table entry with the input search word. As per the circuit design, masking can be done by disable both match-lines to ground pull-down paths. This can be done by following two ways: -

- (1) Global masking is performed by apply both search-lines low input values (SL1 = SL2 = "0"),
- (2) Local masking is accomplished by putting both stored values as low input ( $V_X = V_Y = "0"$ ).





#### V. CONVENTIONAL MLSA

This is the fundamental match-line sense amplifier scheme (MLSA) for detecting the state of NOR-type TCAM shown in Fig. 3. The activity is separated into three stages. In first stage all the SLs are pre-charged to low, so the whole pulldown way from match-line (ML) to GND is disconnected. In second step all the MLs are pre-charged to high. Now in third and final phase search information feed into the SLs.

Now in this activity if all the bits in a word is same as the search information bits the ML stays at High. This is the match condition output of MLSA.



Fig. 3 Block diagram of Conventional MLSA

However, if there is in activity if at least one single bit not match on the match-line, ML releases to GND demonstrates a mismatch for the entire word as the output of MLSA.



Fig. 4 Conventional MLSA search operation (32\*32 Size)



Fig. 5 Conventional MLSA search operation (32\*72 Size)

Above Fig. 4 and Fig. 5 shows the simulated output of search operation in 32\*32 size and 32\*72 size TCAM using conventional MLSA respectively. As shown in figure MLSO 15 is worst case of one-bit mismatch.

#### VI. OR-TYPE CASCADE MLSA

In this scheme the entire memory is divided into various

Retrieval Number: J73760891020/2020©BEIESP DOI: 10.35940/ijitee.J7376.0891020 Journal Website: www.ijitee.org stages, which are associated by this MLSA structure. The equivalent parts that can be successively operate normally. Here each stage operates like an OR gate, which is made by a NOR gate follow by a NOT gate. Primary advantage of this scheme is that here is no sensitive section in the circuit, which increment the power dissipation of the circuit.



Fig. 6 Architectural diagram of OR-type cascade MLSA

Operation of this scheme is such that during search activity in a word, any stage initiates its successive stage for assessment only if all information bits are matched in previous stages. Generally, in a memory the majority of the word stored are mismatched, thus it reduces major amount of power dissipation. This power decrement accompanies a little punishment of increment in search time. The plan execution is most relying upon the number of stages we used in memory. The circuit will improve execution if it has less stages. However, if there are an excessive number of TCAM cells in a single stage the word circuit will may malfunction. In our structure we divided the entire memory in four equivalent stage for 32\*32 size memory and three equivalent stage for 32\*72 size TCAM memory.



Fig. 7 OR-type cascade MLSA search operation (32\*32 Size, 4 Stage)



Fig. 8 OR-type cascade MLSA search operation (32\*72 Size, 3 Stage)



#### A Novel NOR-Type TCAM Deploy Dual-V<sub>T</sub> cell with OR-Type Cascade Match-Line Structure

Above Fig. 7 and Fig. 8 shows the simulated output of search operation in 32\*32 size and 32\*72 size TCAM using OR-type cascade MLSA respectively. In Fig. 7 MLSO 5d represent output of final stage in match case and N\_45, N\_5, N\_6 represents outputs of intermediate stages. In Fig. 8 MLSO 10d represent output of final stage in match case and N 4, N 5 represents outputs of intermediate stages.

### VII. TCAM DEPLOY DUAL-VT CELL USING CONVENTIONAL MLSA

In the past section we have examined points of interest of OR-type MLSA in which entire memory structure is divided to improve speed and decrease power.

As we talked about in this sort of match-line structure the greater part of the TCAM cells are stay in the inactive state, this will consistently consume leakage current. As technology progress and it downsize up to submicron CMOS technology nodes, the leakage current increases up to significant amount that cannot be ignored and thus it considerably affects the total power dissipation of TCAM. This dual- $V_{\rm T}$  circuit strategy is deployed to minimize the undesirable leakage current in the SRAM cell part of TCAM cell.



Fig. 9 16T NOR-type TCAM cell deploy dual- $V_T$  cell

Fig. 9 shows the 16T NOR-type TCAM cell in the inactive state. Let we take case when by-lines are pre-charged to VDD and both SLs are pre-charged to ground. In the TCAM total leakage current is fundamentally dictated by the 6T SRAM cell part of TCAM cell. To reduce the value of leakage current, we deploy high- $V_T$  devices in the 6T SRAM cell part. As per TCAM applications read and write operations are not as critical as the search operation. Thus, high- $V_T$  devices deployed in the SRAM component to minimize the leakage in the CAM. In the comparison circuit (i.e. N1-N4) part of TCAM cell be chosen for better search operation. Here we have requirement of match-line detecting to be quick. To do this we deploy low- $V_T$  devices in comparison cell part of TCAM cell, which prompts a lot quicker discharging for single bit mismatch in worst case detection.



Fig. 10 TCAM deploy dual-V<sub>T</sub> cell using conventional MLSA search operation (32\*32 Size)

Retrieval Number: J73760891020/2020©BEIESP DOI: 10.35940/ijitee.J7376.0891020 Journal Website: www.ijitee.org



Fig. 11 TCAM deploy dual-V<sub>T</sub> cell using conventional MLSA search operation search operation (32\*72 Size)

Above Fig. 10 and Fig. 11 shows the simulated output of search operation in 32\*32 size and 32\*72 size TCAM using OR-type cascade MLSA respectively. In Fig. 7 MLSO 5d represent output of final stage in match case and N\_45, N\_5, N\_6 represents outputs of intermediate stages. In Fig. 8 MLSO 10d represent output of final stage in match case and N\_4, N\_5 represents outputs of intermediate stages.

#### VIII. PROPOSED OR-TYPE CASCADE MATCH-LINE WITH TCAM DUAL-VT CELL MLSA

In this proposed scheme we utilize 16T NOR-type TCAM because it gives fast search activity when compared with NAND-type TCAM. Next, to decrease power consumption of memory we divided the entire memory structure in to equivalent stages. We interface the all the segment by OR-type match-line structure. To decrease the leakage current in SRAM cell part of TCAM, we introduce high- $V_T$  in the circuit. To provide high speed of operation during search operation, we deploy low- $V_T$  cell in comparison cell part of TCAM cell. Thus, by applying dual- $V_T$  cell in TCAM we got the final design of novel TCAM MLSA scheme.



 $Fig.~12~OR-type~cascade~match-line~with~TCAM~dual-V_T\\cell~MLSA~search~operation~(32*32~Size,~4~Stage)$ 



Fig. 13 OR-type cascade match-line with TCAM dual-V<sub>T</sub> cell MLSA search operation (32\*72 Size, 3 Stage)





Above Fig. 12 and Fig. 13 shows the simulated output of search operation in 32\*32 size and 32\*72 size TCAM using OR-type cascade match-line with dual-V<sub>T</sub> cell MLSA respectively. In Fig. 12 MLSO 5d represent output of final stage in match case and N\_10, N\_66, N\_9 represents outputs of intermediate stages. In Fig. 13 MLSO 10d represent output of final stage in match case and N\_29, N\_54 represents outputs of intermediate stages.

#### IX. RESULT AND DISCUSSIONS

Table II and Table III shows the comparison of performances of 32\*32 size and 32\*72 size TCAM structures respectively based on various parameters. Result shows that after deploy dual- $V_{\rm T}$  cell in TCAM it gives better EDP compared with conventional MLSA scheme. High- $V_{\rm T}$  devices limit the leakage current in storage cell and low- $V_{\rm T}$  devices improve the match-line discharge process for single bit mismatch in TCAM. This design permits us to simultaneously limit the cell leakage current and achieve speedy match-line detection both.

Table- II: Performance comparison of 32\*32 size TCAM Schemes

| Resulted<br>Parameters    | 32*32 Size TCAM      |                                                               |                                         |                                                                                       |
|---------------------------|----------------------|---------------------------------------------------------------|-----------------------------------------|---------------------------------------------------------------------------------------|
|                           | Conventional<br>MLSA | TCAM deploy dual- V <sub>T</sub> cell using conventional MLSA | OR type<br>cascade<br>MLSA<br>(4 stage) | OR-type cascade<br>match-Line with<br>TCAM dual-V <sub>T</sub> cell<br>MLSA (4 stage) |
| Search<br>delay(ns)       | 3.132                | 3.102                                                         | 6.003                                   | 6.001                                                                                 |
| Energy<br>(fJ/bit/search) | 3.869                | 3.872                                                         | 1.059                                   | 1.058                                                                                 |
| EDP                       | 12.118               | 12.011                                                        | 6.356                                   | 6.349                                                                                 |

Table- III: Performance comparison of 32\*72 size TCAMs

|                           | 32*72 Size TCAM      |                                                               |                                         |                                                                                       |  |
|---------------------------|----------------------|---------------------------------------------------------------|-----------------------------------------|---------------------------------------------------------------------------------------|--|
| Resulted<br>Parameters    | Conventional<br>MLSA | TCAM deploy dual- V <sub>T</sub> cell using conventional MLSA | OR type<br>cascade<br>MLSA<br>(4 stage) | OR-type cascade<br>match-Line with<br>TCAM dual-V <sub>T</sub> cell<br>MLSA (4 stage) |  |
| Search<br>delay(ns)       | 4.261                | 4.276                                                         | 7.606                                   | 7.598                                                                                 |  |
| Energy<br>(fJ/bit/search) | 2.106                | 1.774                                                         | 0.321                                   | 0.321                                                                                 |  |
| EDP                       | 8.972                | 7.585                                                         | 2.440                                   | 2.438                                                                                 |  |

OR-type cascade MLSA scheme gave a lot of improvement in EDP as shown in table. It delivers a great decrement in power requirement with little speed reduction as penalty. We got the best outcome in OR-type cascade match-line with TCAM dual- $V_T$  cell scheme. The circuit structure of the proposed design is basic and practical achievable, which gives the best execution for high capacity TCAMs.

#### X. CONCLUSION

In this paper we designed and simulated a novel OR-type cascade match-line with TCAM dual- $V_T$  cell scheme. In our design total capacity of the TCAM designed here is fairly small and in this way the measure of the leakage is negligible, when compared with the dynamic power. However, in the modern technology with large size TCAM dynamic power consumption become less significant and leakage power will turn into a significant factor. In this manner, decreasing

leakage is very significant to keep up low power dissipation. Quicker and low power TCAM design for network switches are profoundly required to satisfy the needs of recent blasting of web traffic. We described that our proposed design gives best outcome among other conventional architectures. Considering all, this work proposes a vitality effective TCAM design for exceptionally efficient usage in future.

#### REFERENCES

- L. Chisvin and R. J. Duckworth, "Content-addressable and associative memory: alternatives to the ubiquitous RAM", in IEEE Computer Society, vol. 22, no. 7, pp. 51-64 (1989).
- K. E. Grosspietsch, "Associative processors and memories: a survey", in IEEE Micro, vol. 12, no. 3, pp. 12-19 (1992).
- I. Arsovski and A. Sheikholeslami, "A current-saving match-line sensing scheme for content-addressable memories", in IEEE International Solid-State Circuits Conference, Digest of Technical Papers, ISSCC, San Francisco, CA, USA, vol.1, pp. 304-494 (2003).
- I. Arsovski and A. Sheikholeslami, "A mismatch-dependent power allocation technique for match-line sensing in content-addressable memories", in IEEE Journal of Solid-State Circuits, vol. 38, no. 11, pp. 1958-1966 (2003).
- K. Pagiamtzis and A. Sheikholeslami, "Content-addressable memory (CAM) circuits and architectures: a tutorial and survey", in IEEE Journal of Solid-State Circuits, vol. 41, no. 3, pp. 712-727 (2006).
- J. Zhang, Y. Ye and B. Liu, "A New Mismatch-Dependent Low Power Technique with Shadow Match-Line Voltage-Detecting Scheme for CAMs", Proceedings of the International Symposium on Low Power Electronics and Design, pp. 135-138, Tegernsee, Germany (2006).
- N. Mohan and M. Sachdev, "Low-Leakage Storage Cells for Ternary Content Addressable Memories", in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, no. 5, pp. 604-612 (2009).
- 8. A. T. Do, C. Yin, K. Velayudhan, Z. C. Lee, K. S. Yeo and T. T. H. Kim, "0.77-fJ/bit/search Content Addressable Memory Using Small Match Line Swing and Automated Background Checking Scheme for Variation Tolerance", in IEEE Journal of Solid-State Circuits, vol. 49, no. 7, pp. 1487-1498 (2014).
- J. Zhang, S. Zheng, F. Teng, Q. Ding and X. Chen, "An OR-type cascaded match line scheme for high-performance and EDP-efficient ternary content addressable memory", IEEE Nordic Circuits and Systems Conference (NORCAS), Copenhagen, Denmark, pp. 1-6 (2016).

#### **AUTHORS PROFILE**



Mr. Rahul Nigam passed the B.E. degree in Electronics and Communication Engineering in 2007 from RGPV University, Bhopal and M.Tech. degree in Microelectronics and VLSI Design in 2010 from NIT Calicut, India. He qualified GATE in 2008 with 915 rank and 96.78 percentile. Currently pursue Ph.D. degree in Dr. A.P.J Abdul Kalam University Indore. He published

6 papers in national and international conferences. Beside this he published two research papers in low power CAM/TCAM circuit design area in leading International Journal like SCOPUS etc. His areas of interest are VLSI circuit design and Low power memory circuit design.



**Dr. Santosh Pawar** has done Ph.D. in the area of Non-Linear Fiber Optics from DAVV, Indore in 2014 and M.Tech. in Optical Communication from SGSITS Indore in 2007. Currently he is Principal at School of Engineering and Dean of R & D at Dr. A. P. J. Abdul Kalam University, Indore. His research interest area are integrated optics, memory design, VLSI Design,

memory circuits, non-linear optics, optical fiber Bragg grating based devices, Opto VLSI etc. He published more than 30 research papers in International and National Journals like in SCI and SCOPUS. He presented more than 35 research papers in International/National conferences.



Journal Website: www.ijitee.org