Proceedings of the 9<sup>th</sup> ICEENG Conference, 27-29 May, 2014

EE063 - 1

Military Technical College Kobry El-Kobbah, Cairo, Egypt



9<sup>th</sup> International Conference on Electrical Engineering ICEENG 2014

# High Speed 1-tap Decision Feedback Equalizer in 28 nm CMOS

By

Mostafa Hosny, Sameh Ibrahim, DiaaEldin Khalil, Mohamed Dessouky\*

## Abstract:

Decision Feedback Equalizers (DFEs) are widely used in high speed serial links. DFE can compensate for severe distortion in transmitted signal due to band-limited channels. In this paper we demonstrate two different 1-tap DFE designs in 28 nm CMOS process with one reaching 66Gbps and the other reaching 83Gbps consuming 25mW and 32mW from a 0.9-V supply, respectively. No coils were used for bandwidth extension in the design.

# <u>Keywords:</u>

Decision Feedback Equalizer; high-speed equalizers; half-rate DFE

\* EEC Department, Ain Shams University

# 1. Introduction:

As the need for high throughput from chip-to-chip I/Os is increased, it is predicted that speeds of 40 Gbps per-lane will be needed in the near future [1]. Consequently, high speed serial links have been pushed to the technology limits – particularly DFE circuits – to achieve high speeds while maintaining adequate power consumption. The timing constraints of the design of the DFE feedback loop is directly set by the bit time interval, also referred to as Unit Interval (UI).As the UI decreases, timing constraints become more stringent and challenging for the DFE to meet.

The first tap in the DFE is faced with the toughest time constraints. In 50Gbps links it needs to settle in 20ps even for very small inputs at the input of the decision element. Generally, meeting the  $1^{st}$  tap time constraint is the most challenging part of DFE design.

In the next section, we will give a brief description of basic DFE architectures and their timing constraints. In section 3, we describe our DFE architectures and designs. Section 4 shows the simulation results in 28 nm CMOS and compares it to some prior art. Finally we conclude the paper in section 5.

# 2. Background:

At very high speeds, channel losses pose great challenge requiring the use of heavy equalization in both transmitter (TX) and receiver (RX). DFEs can only equalize postcursor inter-symbol interference (ISI). They also need a large number of taps for very high-loss channels.

Extensive work has been done on DFEs resulting in several architectures [2] - [5]. This can be divided into direct or speculative (unrolled) architectures using either full-rate clocking or half-rate clocking as mentioned in [6].

Shown in Fig. 1, is the direct full-rate DFE architecture [2] (only 1<sup>st</sup> tap is shown for simplicity). The input (analog) signal is 'sliced' by the flip-flop (FF) and is converted to a digital signal, which is fed back to cancel the post-cursor ISI. The highlighted path is the critical path whose timing constraint is given by

$$t_{cq} + t_{setup} + t_{FB} < 1UI \tag{1}$$

Where  $t_{cq}$  is the clock-to-output delay of the FF,  $t_{setup}$  is the FF setup time, and  $t_{FB}$  is the feedback delay which arises from the time constant at the summing node.



Figure (1): Full-Rate Direct DFE Architecture [6]

In Fig. 2, the speculative (unrolled) full-rate DFE architecture is shown [3]. The critical path is highlighted and is given by

$$t_{cq,FF} + t_{setup} + t_{sq,MUX} < 1UI \tag{2}$$

Where  $t_{sq,MUX}$  is the select-to-output delay of the MUX.  $t_{sq,MUX}$  is usually smaller than  $t_{FB}$  because of the lack of the timing constant at the summing node.



Figure (2): Speculative full-rate DFE architecture [6]

In Fig. 3, the half-rate direct DFE architecture is presented [4]. The main benefit of this design is the use of half-rate clocks and the simplified design of clock and data recovery (CDR). The critical path is highlighted and is given by (1).



Figure (3): Speculative full-rate DFE architecture [6]

Figure 4 shows the speculative half-rate DFE architecture [5] and its critical path which is given by (2). As well as having the benefit of using half-rate clocks, it has the benefit of the capacitance being outside of the feedback loop. The drawback of this architecture is its complexity and high power consumption.

## 3. Proposed DFE Architecture:

In this paper we present two architectures, direct half-rate architecture and direct half-rate merged summer-slicer architecture.

We chose to implement direct half-rate architecture and not to use speculation for it 1) is less power hungry, 2) uses half-rate clocks, and 3) is less complex compared to speculative design. Also the implementation of additional taps is more stringent in speculation as demonstrated in [7].

As shown in Fig. 3, the direct half-rate architecture consists of a summer and a latch. Description of circuits is presented next.





Figure (4): Speculative half-rate DFE architecture [6]

#### A. Analog summer

Our resistive analog summer has the schematic shown in Fig. 5. The analog summer function is to add or subtract (depending on previous bit) a certain amount of current from the output to compensate for the ISI effects. Note in designing the analog summer, we must make sure that the input differential voltage satisfy the following equation

$$V_{in-diff} < \sqrt{2} * V_{od} \tag{3}$$

Where  $V_{in-diff}$  is the input differential voltage and  $V_{od}$  is the over-drive voltage of the input differential pair. This is to avoid any clipping in the output signal and to insure operation in the linear region of the differential pair [8]. The opposite is needed for 1<sup>st</sup> tap differential pair. We need  $V_{od}$  to be as small as possible for complete current steering for small signal inputs.

## B. CML latch

Current-mode logic (CML) latch consists of an input tracking stage utilized to sense and track the data variation and a cross-coupled regenerative pair being employed to store the data. The 1<sup>st</sup> is active when clock is high where as the later is active when the CLK is low.



Figure (5): Analog resistive summer

The data is then fed to the analog summer of the other path to use it in equalization. The output signal should be just enough for the summer to perceive it as digital (the signal is considered digital if it is large enough to perform complete steering of the current in the 1<sup>st</sup> tap coefficient differential pair). By carefully sizing the 1<sup>st</sup> tap differential pair in the summer, we can reduce the output levels needed for complete current steering for the CML latch which decreases power consumption and increases the CML latch speed. The CML latch schematic is shown in Fig. 6

## C. Merged summer-slicer

To increase the speed of the DFE, it was proposed in [7] to merge the summer and the slicer as shown in Fig. 7. When CLK is high, the circuit is ON and the output nodes are charged to the values depending on input and the previous bit. And when CLK is low, the differential pair is OFF and the output nodes maintain their value to be used in the odd/even stage to drive the 1<sup>st</sup> tap differential pair.



Figure (6): CML latch



Figure (7): Merged Summer-Slicer circuit

## 4. Simulation Results:

The designs were simulated using HSPICE in Synopsys Custom Design Environment. The two circuits were simulated with 200mV peak-to-peak pseudo random input signal filtered by a  $(1 + 0.8Z^{-1})$  channel. Fig. 8 shows an 83Gbps 200mV pseudo random input and the effect of the channel on it.

*EE063 - 8* 

The half-rate DFE achieved speeds of 66Gbps while consuming 25mW from a 0.9-V supply. The eye opening is about 85mV vertical opening and 13.5ps horizontal opening (90% of the UI). The eye diagram of the even path is shown in Fig. 9.



*Figure (8):* 83Gbps 200mV pseudo random input(left) and the effect of the channel on it(right)

We chose to implement direct half-rate architecture and not to use speculation for it 1) is less power hungry, 2) uses half-rate clocks, and 3) is less complex compared to speculative design. Also the implementation of additional taps is more stringent in speculation as demonstrated in [7].

The merged summer-slicer DFE achieved 83Gbps speed while consuming 31.1mW from a 0.9-V supply. The vertical eye opening is 100mV and horizontal eye opening of 19.4ps (81.25% of the UI). The eye diagram of the even path is shown in Fig. 10.

As shown in Fig. 3, the direct half-rate architecture consists of a summer and a latch. Description of circuits is presented next.





Figure (9): Eye of the summing node in the even path



Figure (10): Eye of the summing node in the even path

Table I compares our work to state of the art DFE designs. The simulation results show that our DFE achieves high speeds with good power efficiency. Fabrication of the given DFE will give us stronger confidence in our results.

 Table (2): Rule base for the position controller

| Reference            | [7]  | [9]      | [10]  | [11]     | This work |
|----------------------|------|----------|-------|----------|-----------|
| Process technology   | 65nm | 32nm SOI | 28nm  | 22nm     | 28nm      |
| Supply Voltage       | 1.2  | 1.15     | 0.9   | 1.07     | 0.9       |
| Data Rate (Gbps)     | 66   | 30       | 32    | 32       | 83        |
| <b># of DFE Taps</b> | 3    | 15       | 2     | 6        | 1         |
| Power (mW)           | 46   | 92*      | 120** | 25.625** | 32        |
| Power efficiency     | 0.7  | 3.06     | 3.75  | 0.8      | 0.39      |
| (pJ/bit)             |      |          |       |          |           |

\*Includes CTLE and clock distribution power

\*\* Estimated based on [12]

### 5. Conclusion:

In this paper we have described and demonstrated two DFE designs achieving high speeds of 66Gbps and 83Gbps, in 28 nm CMOS technology, consuming 25mW and 32mW from a 0.9-V, respectively. The two designs show good power efficiency compared to prior art due to the use of 28 nm technology. The two designs were tested on a  $(1 + 0.8Z^{-1})$  channel.

## <u>References:</u>

- [1] "International Technology Roadmap for Semiconductors, 2012 Edition," in http://www.itrs.net, Dec 2012.
- [2] Garg, A.; Carusone, A.C.; Voinigescu, S.P., "A 1-Tap 40-Gb/s Look-Ahead Decision Feedback Equalizer in 0.18-μm SiGe BiCMOS Technology," *Solid-State Circuits, IEEE Journal of*, vol.41, no.10, pp.2224,2232, Oct. 2006.
- [3] Young-Soo Sohn; Seung-Joon Bae; Hong-June Park; Soo-In Cho, "A 1.2 Gbps CMOS DFE receiver with the extended sampling time window for application to the SSTL channel," *VLSI Circuits Digest of Technical Papers, 2002. Symposium on*, vol., no., pp.92,93, 13-15 June 2002.

- [4] Payne, R.; Landman, P.; Bhakta, B.; Ramaswamy, S.; Song Wu; Powers, J.D.; Erdogan, M.U.; Ah-Lyan Yee; Gu, R.; Lin Wu; Yiqun Xie; Parthasarathy, B.; Brouse, K.; Mohammed, W.; Heragu, K.; Gupta, V.; Dyson, L.; Wai Lee, "A 6.25-Gb/s binary transceiver in 0.13-μm CMOS for serial data transmission across high loss legacy backplane channels," *Solid-State Circuits, IEEE Journal of*, vol.40, no.12, pp.2646,2657, Dec. 2005.
- [5] Bulzacchelli, J.F.; Meghelli, M.; Rylov, S.V.; Woogeun Rhee; Rylyakov, A.V.; Ainspan, H.A.; Parker, B.D.; Beakes, M.P.; Aichin Chung; Beukema, T.J.; Pepeljugoski, P.K.; Lei Shan; Kwark, Y.H.; Gowda, Sudhir; Friedman, D.J., "A 10-Gb/s 5-Tap DFE/4-Tap FFE Transceiver in 90-nm CMOS Technology," *Solid-State Circuits, IEEE Journal of*, vol.41, no.12, pp.2885,2900, Dec. 2006.
- [6] Ibrahim, S.; Razavi, B., "Low-Power CMOS Equalizer Design for 20-Gb/s Systems," *Solid-State Circuits, IEEE Journal of*, vol.46, no.6, pp.1321,1336, June 2011.
- [7] Yue Lu; Alon, E. "Design Techniques for a 66 Gb/s 46 mW 3-Tap Decision Feedback Equalizer in 65 nm CMOS", *Solid-State Circuits, IEEE Journal of,* On page(s): 3243 3257 Volume: 48, Issue: 12, Dec. 2013.
- [8] Behzad Razavi, Design of Analog CMOS Integrated Circuits, McGraw-Hill, (2001)
- [9] Toifl, T.; Ruegg, M.; Inti, R.; Menolfi, C.; Brandli, M.; Kossel, M.; Buchmann, P.; Francese, P.A.; Morf, T., "A 3.1mW/Gbps 30Gbps quarter-rate triplespeculation 15-tap SC-DFE RX data path in 32nm CMOS," VLSI Circuits (VLSIC), 2012 Symposium on, vol., no., pp.102,103, 13-15 June 2012.
- [10] Parikh, S.; Kao, T.; Hidaka, Y.; Jian Jiang; Toda, A.; Mcleod, S.; Walker, W.; Koyanagi, Y.; Shibuya, T.; Yamada, J., "A 32Gb/s wireline receiver with a lowfrequency equalizer, CTLE and 2-tap DFE in 28nm CMOS," *Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International*, vol., no., pp.28,29, 17-21 Feb. 2013

- [11] Jaussi, J.; Balamurugan, G.; Hyvonen, S.; Tzu-Chien Hsueh; Musah, T.; Keskin, G.; Shekhar, S.; Kennedy, J.; Sen, S.; Inti, R.; Mansuri, M.; Leddige, M.; Horine, B.; Roberts, C.; Mooney, R.; Casper, B., "26.2 A 205mW 32Gb/s 3-Tap FFE/6-tap DFE bidirectional serial link in 22nm CMOS," *Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE International*, vol., no., pp.440,441, 9-13 Feb. 2014
- [12] James Jaussi, "Ultra-Efficient Mobile I/O", a presentation in ISSCC 2014