# Automatic Layout of Analog and Digital Mixed Macro/Standard Cell Integrated Circuits

William Swartz

Yale University

## Outline

- Introduction
- Mixed Design Flow
- New Placement Techniques in TimberWolf
  - Statistical Wire Estimation
  - Timing Driven Placement
  - Analog Crosstalk Minimization
- New Global Routing Techniques
- Results
- Conclusions

## **Why Integrated Circuits?**

- System integration and consolidation
- Weight and size reduction
- Increased reliability
- Component matching
- Cost reduction

#### **Spectrum of Design Methodologies for ICs**



## **Designing Integrated Circuits**



## **Physical Design Transformation**



#### **Textual Representation**

INSTANCE 1 Flipflop Phase1, Feedback, Out, Feedback :

CELL FlipFlop CLK, D, Q, QB





## **Design Styles**



Standard cell design style

Suits random logic



Macro cell design style

Suits array architectures

## **Row-based Design Methodologies**





Island style gate array

## **Mixed Design Style**



## **Tour of Mixed Macro / Standard Cell Physical Design**



## **Cluster : Partitioning**





## **Genrows : Core region floorplanning**



## **Genrows : Row topology generation**



## **TimberWolfSC : Placement and global routing**

- Placement determine the position of individual cells given a cost function.
  - Wire length
  - Timing constraints
- Global routing determine the interconnections of the signal network.
  - The network is decomposed into net segments.
  - The region for each net segment is calculated.

## **Sc\_route : Standard cell detailed-routing**



## **TimberWolfMC : Placement refinement (compaction)**





#### **Mc\_route : Final detailed-route**



## **Need Efficient Algorithms**

| complexity         | n=20       | n=50                 | n=100                | n=200                | n=500                | n=1000 |
|--------------------|------------|----------------------|----------------------|----------------------|----------------------|--------|
| 1000 <i>n</i> *    | 0.02 sec   | 0.05 sec             | 0.1 sec              | 0.2 sec              | 0.5 sec              | 1 sec  |
| 1000 <i>nlgn</i> * | 0.09 sec   | 0.3 sec              | 0.6 sec              | 1.5 sec              | 4.5 sec              | 10 sec |
| $100n^{2_{*}}$     | 0.04 sec   | 0.25 sec             | 1 sec                | 4 sec                | 25 sec               | 2 min  |
| $10n^{3}$ *        | 0.02 sec   | 1 sec                | 10 sec               | 1 min                | 21 min               | 2.7 hr |
| n <sup>lgn</sup>   | 0.04 sec   | 1.1 hr               | 220 days             | 125 cent             | $5 \times 10^8$ cent |        |
| $2^{n/3}$          | 0.0001 sec | 0.1 sec              | 2.7 hr               | $3 \times 10^4$ cent |                      |        |
| $2^n$              | 1 sec      | 35 yr                | $3 \times 10^4$ cent |                      |                      |        |
| 3 <sup>n</sup>     | 58 min     | $2 \times 10^9$ cent |                      |                      |                      |        |

#### "Big-O" notation

*• O*-notation gives an upper bound of a function within a constant factor.

For a given function g(n) we denote by O(g(n)) the set of functions  $O(g(n)) = \{f(n) : \text{there exist positive constants } c \text{ and } n_0 \text{ such that}$  $0 \le f(n) \le cg(n) \text{ for all } n \ge n_0 \}$ .



## **Basic Simulated Annealing Algorithm**

Algorithm simulated\_annealing()

| 1 | do                                           |
|---|----------------------------------------------|
| 2 | do                                           |
| 3 | j = generate(i)                              |
| 4 | if $\operatorname{accept}(\Delta C, T)$ then |
| 5 | i = j                                        |
| 6 | until cost is in equilibrium                 |
| 7 | reduce(T)                                    |
| 8 | until cost cannot be reduced any further     |

#### Algorithm accept( $\Delta C, T$ )

- 1 if  $\Delta C \leq 0$  then /\* new cost is less than or equal to the old cost \*/
- 2 **return**(ACCEPT) /\* accept the new configuration \*/
- 3 else
- 4 randomly generate a number r between 0 and 1

5 if 
$$r < xp \left( \frac{-\Delta C}{T} \right)$$
 then return(ACCEPT)

6 **else return**(REJECT)

## **New Placement Techniques-Wire Area Estimation**



- Routing interconnect between adjacent macro cells is modeled as extra area appended to each of the corresponding cells during simulated annealing placement.
- Wiring area between macros is estimated during placement in order to avoid large perturbations in the topology during detailed-routing and compaction.

## **Problems with Existing Wire Estimators**

- Existing wire estimators are theoretical models.
- Problematic if the design style violates any of the assumptions of the theoretical model.
- For example, model must be changed if another routing layer is added.

## **Inaccurate Modeling**



TimberWolfMC version 1 overestimates the routing area needed for cell C1.

## **Resulting Placement using Previous Wire Estimator**



Placement after global routing using original wire estimator. Notice the gross inaccuracy in the estimation of cell C1.

# Inaccurate Wiring Estimates Lead to Poor Area Efficiency





#### **Solution: Statistical Wire Estimator**

- Use a general statistical model which is adapted for every circuit.
- The stimated interconnect area for cell edge *i* to be

$$^{i} = c_{0} + c_{1} \cdot x + c_{2} \cdot x^{2} + c_{3} \cdot y + c_{4} \cdot y^{2} + c_{5} \cdot p$$
(1)

where  $c_0...c_5$  are constants, x and y are the normalized chip coordinates [0.0, 1.0], and p is the number of pins in the channel.

- To obtain the model constants:
  - Place the circuit using a 10*x* annealing schedule and the theoretical estimation model.
  - Perform global routing and/or detail routing to calculate routing areas.
  - Use a least squares method to fit the data to estimator model.
- Subsequent placements are performed using the statistical model.
- Placement algorithm *learns* from the previous executions.
- The statistical model adapts to any design style and routing technology.

## **Placement Using Statistical Estimator**



Placement after global routing using statistical wire estimator.

## **Accurate Wire Estimation**



- This is the minimum area placement for this example.
- The remaining white space does not impact chip area.

#### **New Placement Techniques - Timing Constraints**



The parasitic capacitance of a wire segment is

$$C = \frac{\varepsilon A}{d} = \frac{\varepsilon l w}{t} \Longrightarrow C \propto l \tag{2}$$

The resistance of a wire segment is given by

$$R = \frac{\rho l}{A} = R_s \cdot \frac{l}{w} \Longrightarrow R \propto l \tag{3}$$

## **Timing Driven Placement**

TimberWolf supports the following types of timing constraints:

- critical path using wire length constraints.
- matched critical path using wire length constraints.
- critical path using timing constraints.
- matched critical path using timing constraints.
- critical path analysis using pin pair constraints.
- matched critical path analysis using pin pair constraints.

#### **Critical path using wire length constraints**

- Implemented first in TimberWolf version 5.6.
- The penalty assigned for a path *p* is the amount the length deviates from satisfying the bounds:

$$_{p} = \begin{cases} \text{length}(p) - \text{upperBound}(p) & \text{if} \quad \text{length}(p) > \text{upperBound}(p) \\ \text{lowerBound}(p) - \text{length}(p) & \text{if} \quad \text{length}(p) < \text{lowerBound}(p) \\ 0 & \text{otherwise} \end{cases}$$
(4)

where the length of a path p is the sum of the half perimeter wire length of all the nets n in the path:

length 
$$(p) = \sum_{\forall n \in p} S_x(n) + S_y(n)$$
 (5)

#### **Wire length Constraints**

The total penalty is just the sum over all the specified critical paths:

$$_{T} = \sum_{p=1}^{N_{p}} P_{p} \tag{6}$$

Does not take drive strength into account.

#### **Matched Wire Length Constraints**

The user specifies a tolerance for the mismatch in path length. In this case, we assign the penalty for a set of paths to be:

 $P_{p} = \begin{cases} \text{match}(p) - \text{tolerance}(p) & \text{if } \text{match}(p) > \text{tolerance}(p) \\ 0 & \text{otherwise} \end{cases}$ (7)

where the match is defined as

$$match(p) = \left| length(p_1) - length(p_2) \right|$$
(8)

Useful for analog circuits.

#### **Critical Path Using Timing Constraints**

- Time constraints which consider driver strength.
- The arrival time  $T_a$  for a path p is the summation of all the net delays n for the path:

$$_{a} = \sum_{\forall n \in p} D_{n} \tag{9}$$

The delay for a single net *n* is the sum of the intrinsic gate delay  $T_n$ associated with the driver of the net *n*, and the product of the equivalent driver resistance  $R_n$ , and the total load capacitance seen by the driver:

$$D_n = T_n + R_n C_n \tag{10}$$

### **Parasitics**

 The total capacitance for a net has two components: gate input capacitance and parasitic capacitance.

$$C_n = C_{G_n} + C_{p_n} \tag{11}$$

 During placement, we can estimate the parasitic capacitance using the half perimeter bounding box metric:

$$C_{n_{p}} = C_{L_{x}} S_{x}(n) + C_{L_{y}} S_{y}(n)$$
(12)

### **Arrival Time Equation**

Substituting Equation 8 and Equation 9 into Equation 10, we get:

$$D_{n} = T_{n} + R_{n}C_{G} + R_{n} [C_{L_{x}}S_{x}(n) + C_{L_{y}}S_{y}(n)]$$
(13)

 We can precompute the terms in the summation which do not depend on wire length by defining:

$$= \sum_{\forall n \in p} \left[ T_n + R_n C_G \right] \tag{14}$$

This results in the following simplified equation for the arrival time:

$$T_{a} = \sum_{\forall n \in p} R_{n} \left[ C_{L_{x}} S_{x}(n) + C_{L_{y}} S_{y}(n) \right] + K$$
(15)

### **Penalty for Critical Path Constraints**

$$= \begin{cases} T_{a}(p) - T_{ru}(p) & \text{if} \quad T_{a}(p) > T_{ru}(p) \\ T_{rl}(p) - T_{a}(p) & \text{if} \quad T_{a}(p) < T_{rl}(p) \\ 0 & \text{otherwise} \end{cases}$$
(16)

where the user has specified an upper  $(T_{ru})$  and lower bound  $(T_{rl})$  on the required arrival times.

#### **Penalty for Matched Critical Path Constraints**

We assign the penalty for a set of paths to be:

$$P_{p} = \begin{cases} \text{match}(p) - \text{tolerance}(p) & \text{if } \text{match}(p) > \text{tolerance}(p) \\ 0 & \text{otherwise} \end{cases}$$
(17)

where the match is now defined as difference in arrival times,

match 
$$(p) = |T_a(p_1) - T_a(p_2)|$$
 (18)

## **Timing Driven Placement Using Pin Pairs**

User only specifies primary input and primary output pin pairs from logic diagram.

No need to enumerate critical paths between pins.



## **Pin Pair Constraints**

- From logic diagram convert to timing graph
- Remove any cycles by breaking feedback paths.
- ✓ Use delay modifiers to break any false paths (user specified or from TA).



## New simulated annealing algorithm

#### **Algorithm** Simulated-Annealing-With-Timing(*M*)

```
1 {FP} \leftarrow readFalsePaths() /* read list of false paths from user */
    buildTimingGraphs (P, G_n) /* P is the set of all pin pairs */
 2
 3
     do
 4
        do
 5
           i = \text{generate}(i)
            if accept(\Delta C, T) then
 6
 7
                i = j
 8
        until cost is in equilibrium
        for each d = (i, j) \in P do /* d is a pin pair */
 9
10
            if T_{lb}(i, j) > 0 then /* a nonzero lower bound exists */
                 \{T_a\} \leftarrow \text{findMShortestPaths}(i, j, M)
11
            for each edge e \in E[V] do /* negate edge weights */
12
13
               w' \leftarrow -w
            \{T_a\} \leftarrow \text{findMShortestPaths}(i, j, M) /* \text{find M longest paths }*/
14
15
        reduce(T)
16
     until cost cannot be reduced any further
```

## **Pin Pair Constraints**

- User parameter *M* controls the number of paths monitored during simulated annealing.
- Since timing graphs are directed acyclic graphs, we can find the *M* shortest or *M* longest paths in *O*(*Mn*log*n*) time using Dreyfus's algorithm.

#### **Matched Pin Pair Constraints**

The user specifies a tolerance for the mismatch in path length. In this case, we assign the penalty for a set of paths to be:

$$_{p} = \begin{cases} \text{match}(p) - \text{tolerance}(p) & \text{if } \text{match}(p) > \text{tolerance}(p) \\ 0 & \text{otherwise} \end{cases}$$
(19)

where the match is now defined as difference in arrival times,

match 
$$(p) = \max_{i,j} |T_a(p_i) - T_a(p_j)|$$
 (20)

#### **Results: wire length constraints**



### **Results: pin pair constraints**



## **Results : pin pair constraints**

| run                     | no constraints       | M=1                  | M=5                  | M=17                 |
|-------------------------|----------------------|----------------------|----------------------|----------------------|
| 1                       | 198                  | 130                  | 131                  | 129                  |
| 2                       | 209                  | 134                  | 125                  | 134                  |
| 3                       | 211                  | 135                  | 119                  | 133                  |
| 4                       | 214                  | 136                  | 126                  | 124                  |
| 5                       | 209                  | 137                  | 119                  | 129                  |
| average                 | 208                  | 134                  | 124                  | 130                  |
|                         | no constraints       | M=1                  | M=5                  | M=17                 |
| wire length             | 57276                | 60691                | 63073                | 61209                |
| area (µm <sup>2</sup> ) | $1.56 \times 10^{6}$ | 1.60×10 <sup>6</sup> | 1.68×10 <sup>6</sup> | $1.58 \times 10^{6}$ |
| run time (secs.)        | 670                  | 801                  | 966                  | 1201                 |

The integrated circuit is 34% faster, with an area penalty of only 2.5% and an execution time increase of 16%.

## **Timing Results :** *Struct*



## **Crosstalk Constraints**

Parasitic capacitance between conductors:

$$C_{AB} = \frac{\varepsilon A}{d} \tag{21}$$

Lower capacitance

- Decrease area of overlap.
- Increase spacing between conductors.

## **Net Classification Scheme**

#### ANAGRAM net classification

NET digital\_net1 noisy

NET analog\_net1 sensitive

NET ground CLASS neutral

NET digital\_net2 CLASS noisy

NET digital\_net3 CLASS noisy

#### General net classification

NET digital\_net1 CLASS 1

NET analog\_net1 CLASS 2

NET ground CLASS 3

NET digital\_net2 CLASS 4

NET digital\_net3 CLASS 5

CROSSTALK CLASS 1 CLASS 2 10 10

CLASS 3 SHIELDS 1 FROM 2

CROSSTALK CLASS 2 CLASS 4 10 10

CROSSTALK CLASS 4 CLASS 5 20 20

## **Justification for the General Net Classification**



Each signal has a period of activity.

### **Crosstalk penalty in Simulated Annealing**

$$_{T} = \begin{bmatrix} P & \text{if overla} \\ 0 & \text{otherwise} \end{bmatrix}$$
(22)

$$P = [\min(X_i, X_j) - \max(x_i, x_j)] + [\min(Y_i, Y_j) - \max(y_i, y_j)] + d_x + d_y \quad (23)$$



#### **Crosstalk (without constraints)**



### **Crosstalk (with constraints)**



## **New Global Routing Algorithm**

- Generalized row-based global router suitable for standard cell, gate-array, sea-of-gates, and FPGAs.
- First row-based global router to *explicitly* minimize chip area.
- Adapts to technologies.
- Uses exact port locations and exact density calculations.
- Handles preplaced feedthrough cells.
- Allows incremental global routing.
- Vertical constraint minimization
- Takes timing into account.

## **Previous Work**

#### **Global Routing**

- Maze Routing Lee [1961].
- Constructive Method J.T. Lee and M. Marek-Sadowska [1984].
- TimberWolf 3.2 Standard Cell Global Router Sechen and Sangiovanni-Vincentelli [1986].
- Integer Linear Programming Raghavan and Thompson [1987].
- Parallel Algorithm Rose [1988].
- Standard Cell Router Cong and Preas [1988].
- Field Programmable Gate Arrays Rose et al. [1990][1991].
- Sea-of-Gates Extension Lee and Sechen [1991].
- Provably Good Performance-Driven Global Routing Cong et al. [1992].

## **Previous Work**

#### **Steiner Tree Generation**

- Fermat [1638]
- Torricelli [1646]
- Hanan [1966]
- Hwang [1976][1979]
- Lee, J. H., Bose, and Hwang [1976]
- Servit [1981]
- Bern [1988]
- Lee, K. and Sechen [1988]
- Richards [1989]
- Ho, Vijayan and Wong [1990]
- Hasan, Vijayan, and Wong [1990]
- Kahng and Robins [1990]
- Chua and Lim [1991]

## **But the Results are Worse**

| Circuit  | TimberWolfSC /SGGR |           |        | TimberWolfSC with iterated Steiner |           |        |
|----------|--------------------|-----------|--------|------------------------------------|-----------|--------|
|          | wirelength         | feedthrus | tracks | wirelength                         | feedthrus | tracks |
| primary1 | 944360             | 717.8     | 163.5  | 942816                             | 749       | 163.4  |
| primary2 | 4251730            | 4383      | 346.4  | 3811600                            | 2982      | 359.6  |

• Data is for an average of 8 runs.

## **Steiner Points Aren't Always Good**



- A) Desired Steiner tree if Steiner point separation *D* is greater than average feed separation.
- B) Otherwise, only one Steiner point should be inserted.
- Feedthrough resources must be taken into account when building Steiner trees.

## Overview

Algorithm Global-Router(placement, netlist)

- 1 Region-Generation()
- 2 Area-Minimization()
- 3 Assign-Multi-Row-Feedthroughs()
- 4 Assign-Single-Row-Feedthroughs()
- 5 Remove-Cell-Overlap()
- 6 Cell-Swap-Optimization()
- 7 Switchable-Segment-Optimization()
- 8 Maze-Route()
- 9 Vertical-Constraint-Minimization()
- 10 Route-Verification()



# **Area Minimization**

#### Algorithm Area-Minimization()

1 for  $i \leftarrow 1$  to numnets

2 **do**  $T_i$  = Build-Steiner-Tree(*i*, MINIMUM\_WIRELENGTH) numregions numrows  $H = \sum_{r=1}^{\infty} trackpitch \cdot tracks_r + \sum_{b=1}^{\infty} height_b$ 3 compute r = 1 $W = \max_{b \in \{1, numrows\}} (length_b + feedwidth \cdot feeds_b)$ 4 compute 5 compute  $P_{T}$  $oldcost \leftarrow W \cdot H$ 6 7 do 8  $n \leftarrow Random(1, numnets)$ pick segment s of net n 9 10 if s crosses maximum density then 11  $cost \leftarrow Flip-Segment()$ 12 else 13 if feed limited then  $T_n \leftarrow \text{Build-Steiner-Tree}(n, \text{MINIMUM_FEEDS})$ 14 else if density limited then 15 16  $T_n \leftarrow \text{Build-Steiner-Tree}(n, \text{MINIMUM_DENSITY})$ 17 else if Instance-Versions exists for net then 18  $\{T_i\} \leftarrow \text{Swap-Instance-Versions}(n)$ 19  $cost \leftarrow Calculate-New-Cost()$ 20 if cost < oldcost then Accept-Move()</pre> 21 while *cost* improves

## **Steiner Tree Construction**



Minimize Feeds

Minimize Wirelength

Minimize Density

## **Switchable Segment Optimization**



## **Instance Versions**



## **Steiner Tree Reconstruction**







# **Feedthrough Assignment**

#### First Phase

Assign feedthroughs over macro cells and/or FPGA freeways.

#### Second Phase

Assign feedthroughs over rows if needed.

Optimal with respect to wirelength and congestion.

## **Phase I Assignment**

#### Cost function

 $C_{ij} = |X_i - X_j| + (Y_i - y_i) - [\min(Y_i, Y_j) - \max(y_i, y_j)] + P_1$  $F_i(x) = [x_i, X_i], F_i(y) = [y_i, Y_i]$ where  $C_{j}(x) = [x_{j}, X_{j}], C_{j}(y) = [y_{j}, Y_{j}]$  $P_1 = K \cdot \Delta density$  $(X_i, Y_i)$  $(X_i, Y_i)$ Freeway Crossing  $(x_i, y_j)$  $(x_i, y_i)$ 

## **Multiple Row Freeway Assignment**

#### Algorithm Feedthrough Assignment Phase I()

1 *work\_to\_do* 
$$\leftarrow$$
 TRUE

**2** do

- 3 { $Region_i$ }  $\leftarrow$  Find-Available-Macrofeed()
- 4  $num\_free \leftarrow Find-Number-Macrofeeds({Region_i})$
- 5  $num\_cross \leftarrow Find-MultiRow-Crossings({Region_i})$
- 6 **if** *num\_cross* = 0 **then break**
- 7 **if** *num\_cross* > 0 and *num\_free* = 0 **then error break**
- 8 **if** *num\_cross* > *num\_free* **then**
- 9 Relax-Crossings({ $Region_i$ })
- 10 Linear-Assignment(*C*)
- 11 Update-Steiner-Trees()
- 12 **while** *work\_to\_do*

## **Phase II Assignment**

#### Cost function

$$C_{ij} = |X_i - X_j| + P_1 + P_2 + P_3$$

where

| $P_1 = \begin{cases} rowlength \\ 0 \end{cases}$         | if        | explicit | fe |
|----------------------------------------------------------|-----------|----------|----|
| $r_1 = t_0$                                              | otherwise |          |    |
| $P_2 = \begin{cases} K & \text{if} \\ 0 & \end{pmatrix}$ | unused    | multife  |    |
| $\frac{2}{2}$ 0                                          | other     | rwise    |    |

$$P_3 = K \cdot \Delta density$$

rowlength «K

| Explicit Feed $(X_i, Y_i)$ | $(X_j, Y_j)$ |  |  |  |
|----------------------------|--------------|--|--|--|
|                            |              |  |  |  |
| $(x_i, y_i)$               | $(x_j, y_j)$ |  |  |  |
|                            |              |  |  |  |
| Explicit Multiple Feed     |              |  |  |  |

# **Feedthrough Assignment**

#### Algorithm Feedthrough Assignment Phase II()

- 1 for  $r \leftarrow 1$  to numrows do
- 2  $avail \leftarrow Find-Available-Feeds(r)$
- 3 *need*  $\leftarrow$  Find-Net-Crossings(*r*)
- 4 **if** *avail* < *need* **then**

6

7

- 5 **if** fixed-width **then** 
  - Relax-Steiner-Trees(r, MINIMUM\_FEEDS)
  - Add-Extra-Feeds-Between-Cells()
- 8 Linear-Assignment(*C*)
- 9 Update-Steiner-Trees()

### **Cell Overlap Removal**





## **Cell Swap Optimization**

Pairwise interchange of neighboring cells.

Orientation optimization.

$$C = \sum_{n=1}^{nets} \left( S_{x_n} + S_{y_n} \right) + P_T$$
  
where  
$$S_{x_n}, S_{y_n} \quad steiner \quad wirelength$$
  
$$P_T \quad timing \quad penalty$$

Has major impact for designs that require many explicit feedthrus.

## **Switchable Segment Optimization**

- Run area minimization algorithm again.
- Since feed positions are known  $\Rightarrow$  chip width is fixed.

Optimize track density.



### **Maze Routing Critical Edges**

We define

$$critical \equiv \begin{cases} 0 \text{ if } density(node) < maxdensity(region) - 1 \\ \\ 1 & otherwise \end{cases}$$

Dynamically, we update the edge cost weights

$$C = \begin{cases} \infty & if \ critical(n_1) \ or \ critical(n_2) \\ |x_1 - x_2| + |y_1 - y_2| \ otherwise \end{cases}$$



#### **Maze Route Example**



### **Vertical Constraint Minimization**

| lg | orithm Vertical-Constraint-Minimization                    | ()                                                  |
|----|------------------------------------------------------------|-----------------------------------------------------|
| 1  | $C \leftarrow \{ \}$                                       | /* C is the set of cycles */                        |
| 2  | for $r \leftarrow 1$ to numregions do                      |                                                     |
| 3  | $G_v \leftarrow \text{Build-Vertical-Constraint-Graph}(r)$ |                                                     |
| 4  | $C \leftarrow C \cup \text{FindCycles}(G_v, r)$            | /* find cycles in the vertical constraint graph */  |
| 5  | $oldcost \leftarrow Initialize-Cost()$                     | /* calculate initial track density */               |
| 6  | while $C \neq \{ \}$ and <i>cost</i> improves <b>do</b>    |                                                     |
| 7  | $n \leftarrow Random(1, numnets)$                          |                                                     |
| 8  | pick segment s of net n                                    |                                                     |
| 9  | if $s \in C$ then                                          | /* check to see if segment s occurs in any cycle */ |
| 0  | cost← Flip-Segment()                                       |                                                     |
| 1  | $broken \leftarrow Check-Cycle(C, s)$                      | /* does the flip break the cycle */                 |
| 2  | <b>if</b> cost ≤ oldcost <b>and</b> broken = TRUE          | E then                                              |
| 3  | Accept-Move()                                              |                                                     |
| 4  | $oldcost \leftarrow cost$                                  |                                                     |
| 5  | $C = C - \{C_s\}$                                          |                                                     |
|    |                                                            |                                                     |

### **Route Verification**

- Depth-first search of Steiner tree checks to see if all pins are connected.
- Depth-first can detect if cycles exist.

#### **Row-based FPGA Extensions**

#### Freeway map



#### **Row-based FPGA Extensions**

 FPGA pin-maps are handles using instance version moves in area minimization.



### **Row-based FPGA Extensions**



# **Algorithmic Complexity**

| Step                    | Time Complexity                  | Space Complexity                 |  |
|-------------------------|----------------------------------|----------------------------------|--|
| Region-Generation       | $O(R^2)$                         | O(R)                             |  |
| Area-Minimization       | O(NS)                            | <i>O</i> ( <i>P</i> )            |  |
| Assignment I            | $O(F^3)$                         | <i>O</i> ( <i>F</i> )            |  |
| Assignment II           | $O(F^3)$                         | <i>O</i> ( <i>F</i> )            |  |
| Remove-Cell-Overlap     | O(ClogC)                         | <i>O</i> ( <i>C</i> )            |  |
| Cell-Swap-Optimization  | <i>O</i> ( <i>C</i> )            | <i>O</i> ( <i>C</i> )            |  |
| Switchable-Segment-Opt  | O(NS)                            | <i>O</i> ( <i>P</i> )            |  |
| Maze-Route              | O(P[logV+E])                     | <i>O</i> ( <i>E</i> + <i>V</i> ) |  |
| Vertical-Constraint-Min | O(NS)                            | <i>O</i> ( <i>P</i> )            |  |
| Route-Verification      | <i>O</i> ( <i>V</i> + <i>E</i> ) | <i>O</i> ( <i>P</i> )            |  |

Legend: C = number of cells, E = number of edges in maze graph, F = number of feedthrus in a single row/region, N = number of nets, P = max. number of pins of a single net, R = number of regions, S = number of net segments, V = number of vertices in maze graph.

## Results

#### MCNC Benchmark Results

Global Router Comparison using the same placement.

| Circuit | Timber | WolfSC vers | sion 6.0             | TimberWolfSC version 7.0 |        |                      |
|---------|--------|-------------|----------------------|--------------------------|--------|----------------------|
|         | width  | tracks      | area $\mu m^2$       | width                    | tracks | area $\mu m^2$       |
| sp1     | 5210   | 163         | 2.18×10 <sup>7</sup> | 5200                     | 160    | 2.16×10 <sup>7</sup> |
| sp2     | 11730  | 401         | 8.76×10 <sup>7</sup> | 11600                    | 374    | 8.34×10 <sup>7</sup> |
| guts    | 8344   | 1817        | 5.76×10 <sup>7</sup> | 8300                     | 1734   | 5.59×10 <sup>7</sup> |

Implicit feeds were removed from the circuits.

### Results

Results of MCNC benchmarks. Track count for implicit feeds case.

| Circuit   | TimberWolfSC 6.0 |      |      | SGGR |      |      | TimberWolfSC 7.0 |     |     |
|-----------|------------------|------|------|------|------|------|------------------|-----|-----|
|           | avg              | min  | max  | avg  | min  | max  | avg              | min | max |
| small     | 60               | 51   | 65   | 55   | 53   | 59   | 49               | 47  | 51  |
| primary1  | 150              | 142  | 167  | 141  | 140  | 141  | 141              | 140 | 143 |
| primary2  | 386              | 368  | 407  | 346  | 342  | 352  | 340              | 338 | 344 |
| industry3 | 1631             | 1576 | 1756 | 1485 | 1477 | 1490 |                  |     |     |

### Results

Results of MCNC benchmarks. Wire length comparison for implicit feeds

case.

| Circuit  | SGGR    |         |         | TimberWolfSC 7.0 |         |         |
|----------|---------|---------|---------|------------------|---------|---------|
|          | avg     | min     | max     | avg              | min     | max     |
| small    | 109719  | 109082  | 111474  | 95501            | 94280   | 96810   |
| primary1 | 801217  | 794165  | 811950  | 736283           | 735290  | 737600  |
| primary2 | 4257644 | 4234530 | 4276180 | 4007742          | 3987810 | 4017180 |

## **Summary - What's New**

- New simulated annealing algorithm which is based on a theoretically derived annealing schedule.
- New method for statistical wiring estimation.
- New placement refinement method was developed for rectilinear cells which spaces the cells at density avoiding the need for post-routing compaction.
- A new algorithm which uses previously generated constraints to maintain the topology during placement refinement.
- A new detailed routing method has been developed which avoids the classically difficult problem of defining channels for detailed routing, and in addition, avoids the equally difficult problem of defining a routing order for the defined channels.
- A new fully automatic placement and routing system has been developed for mixed macro/standard cell designs.
- A new general net classification scheme for eliminating crosstalk between signals.

## What's New (Continued)

Six new algorithms for controlling time delay in an integrated circuit.

- A novel pin pair algorithm controls the delay without the need for user path specification.
- A new generalized row-based global router suitable for standard cell, gatearray, sea-of-gates, and FPGAs.
  - It is the first row based global router to explicitly minimize chip area.
  - This global router automatically adapts to technologies.
  - Optimal feedthrough placement is accomplished using linear assignment.
  - Throughout the algorithm, timing constraints are taken into account.
  - A unique vertical constraint minimization step eases the task of LEA channel routers.