Fast and Lossless Graph Division Method for Layout Decomposition Using SPQR-

Wai-Shing Luk Huiping Huang State Key Laboratory of ASIC & System Cadence Shanghai Inc., Shanghai, P. R. China Fudan University, Shanghai, P. R. China Email: [email protected] Email: [email protected]

Abstract—Double patterning lithography is the most likely solution for or nearly planar. However, the ILP approach is known to be time 32nm and below process nodes due to its cost effectiveness. To enable consuming. To reduce the problem size, graph partitioning techniques this technique, layout decomposition is applied to split a layout into have been proposed to partition a large layout into smaller pieces, two non-conflicting patterns. Nevertheless, this problem is NP-hard in general, especially for layouts with random logic. Thus, high quality such as a modified FM algorithm [10]. However, the quality of results are hard to be achieved in reasonable time. Previously, several results (QoR) is not guaranteed in this kind of methods. On the graph partitioning techniques have been presented in order to speed up other hand, one can observe that if the underlying conflict graph has the process, with the tradeoff of the quality of results (QoR). We propose biconnected components, each of them can be solved independently a graph division method that does not have this deficiency. First, we start with a conflict graph derived from a layout. Based on a data structure without sacrificing any QoR [6]–[8], [11]. Usually significant speedup named SPQR-tree, the graph is divided into its triconnected components can be observed when this method is applied. Thus, it is reasonable to in linear time. The solutions of these components are then combined in a ask if we could further divide each in order way that no QoR is lost. Thus, we call this method a ”lossless” method. to achieve even better speedup without any QoR lost. In this paper, Experimental results show that the proposed method can achieve 5X we attempt to answer this question with the investigation of a data speedup without sacrificing any QoR. structure named SPQR-tree [12]. I.INTRODUCTION SPQR-tree has been successfully applied to the Double patterning lithography (DPL) [1] is the most likely solution applications [13]. Nevertheless, it is still little known in the EDA for 32nm and below process nodes for large volume chip production field. One remarkable result of SPQR-tree is that a biconnected graph due to its cost effectiveness. The idea of the technique is rather can be divided into its triconnected components in linear time with simple: instead of exposing the photoresist layer once under one the help of this data structure [14]. Thus, the division consumes photomask, it exposes the layer twice by splitting the mask into two only very small amount of time compared with the whole color parts, each with features half as dense. As a result, the pattern density assignment process. Furthermore, we derive an algorithm that merges in each exposure is decreased by half and hence resolution and the solutions of triconnected components in linear time. Inspired depth of focus (DOF) are improved. For example, as shown in Fig.1, by the deferred-merge embedding (DME) algorithm in clock tree the features in two different colors are printed in two separated synthesis [15], the algorithm consists of a bottom-up phase followed lithography steps. Although this idea is simple, the implementation by a top-down one. In the bottom-up phase, two potential solutions for of it remains challenging from both standpoints of fabrication [2], each triconnected component are computed. The decision of selecting [3] and design automation [4], [5]. which solution is deferred to the top-down phase. Consequently, our method enables us to accelerate the process without sacrificing any QoR. Thus, we call this method a ”lossless” method. In addition, this method does not contain any tuning parameters. It is flexible enough to be integrated into any layout decomposition framework. The rest of this paper is organized as follows. In Section II, we describe the problem formulation of the layout decomposition. A brief Fig. 1. Double patterning: the red features are printed in one lithography introduction of SPQR-tree is given in Section III. After that, the graph step, and the green ones in another lithography step. division method is presented in Section IV. Experimental results are given in Section V. Finally, we give the conclusions and future work From the design automation point of view, one of the biggest suggestion in Section VI. Note that unless declared explicitly, all the challenges is that such layout decomposition problem is NP-hard in example layouts in the figures are taken from Nangate 45nm open general. High quality results are hard to be achieved in reasonable cell library which can be downloaded from http://www.nangate.com. time for full chip layouts. Nevertheless, high quality is desired because one single error in a feature could make the whole device II.PROBLEM FORMULATION fails. As it turns out, the problem is strongly related to the color Given a layout which is specified by features in polygonal shapes, assignment problem encountered in the alternating phase shifting features are first fractured into rectangles. Note that some rectangles mask technique [6]–[8]. Many innovative approaches have been may further be sliced into smaller pieces to resolve conflicts. Pro- proposed to tackle this problem, such as integer linear program viding that a of rectangles is given, a conflict graph G = (V,E) (ILP) approach [5], [9] and graph-theoretic approach [6]–[8], [10]. is then constructed in which the set of vertices V represents the One of the advantages of the ILP approach is that it makes almost set of rectangles. Note that layout fracturing and node splitting will no assumption to the underlying conflict graph, whereas the graph- generate rectangles that touch with others. Touching rectangles are theoretic approach usually assumes that the conflict graph is sparse assumed not to be in conflict. Two non-touching rectangles are in

978-1-4244-8192-7/10/$26.00 ©2010 IEEE 112 e f b d

h

a c g

Fig. 3. An example of a conflict graph with its triconnected components. Fig. 2. An example of a layout and its corresponding conflict graph. The {a, b}, {c, d}, {c, e}, {c, f} and {g, h} are separation pairs. blue lines represent edges with positive weights and the green ones represent edges with negative weights.

components without affecting any QoR. A vertex is called a cut- vertex of a connected graph G if removing it will disconnect G. If no conflict if their minimum distance is less than a minimum coloring cut-vertex can be found in G, then the graph is called a biconnected spacing and they meet certain criteria. Conflicting rectangles should graph, and is denoted by G0 here. A division of G into its biconnected be assigned to different masks. Thus, for each pair of conflicting components can be performed in linear time by identifying the cut- rectangles, there is an edge with positive weight between the two vertices using a simple depth-first search. It can easily be shown that corresponding vertices. Moreover, two touching rectangles assigned the color assignment problem can be solved for each biconnected to different masks will create a stitch, which should be avoided as component separately without affecting any QoR [6]. possible [9]. Thus, for each pair of touching rectangles, there is an Similarly, a pair of vertices is called a separation pair of G0 if edge with negative weight between the two corresponding vertices, removing the pair of vertices will disconnect G0. For example, as so that same mask is preferred. An example of a layout and its shown in Fig. 3, removing either {a, b}, {c, d}, {c, e}, {c, f} or corresponding conflict graph is shown in Fig. 2. As shown in the {g, h} will disconnect the conflict graph. If no separation pair can figure, the blue lines represent the edges with positive weights and be found in G0, then it is called a triconnected graph. A division of G0 the green ones represent edges with negative weights. into its triconnected components can be performed by identifying the In this paper, we focus on the techniques where the conflict separation pairs in linear time with the help of SPQR-tree [14]. Each graph is assumed to be given. The color assignment problem is now tree node of SPQR-tree is associated with a triconnected component 0 formulated as follows: of G called skeleton, and is denoted as Gs. A skeleton represents a 0 • INSTANCE: Graph G = (V,E) and a function w : E 7→ Z contraction of G based on a set of virtual edges. Each virtual edge 0 • SOLUTION: Disjoint vertex subsets V0 and V1 so that V0∪V1 = represents a portion of G . A skeleton was classified into four types V and V0 ∩ V1 = ∅. in the original paper, namely S (series), P (parallel), Q (trivial) and P • MINIMIZE: the total cost w(e) where Ec = {(u, v): R (rigid) [12]: e∈Ec u, v ∈ V0 or u, v ∈ V1, (u, v) ∈ E} • Series: the skeleton is a cycle graph. We assign each disjoint vertex subset Vi with a distinct color and • Parallel: the skeleton contains only two vertices s and t, and k say that two vertices are in the same color if they belong to the same parallel edges between s and t where k ≥ 3. Vi. Without loss of generality, we may assume that G is a biconnected • Trivial: the skeleton contains only two vertices s and t, and two graph. The problem can be transformed into one that only contain parallel edges between s and t, one is a virtual edge and the edges with positive weights, by a simple trick described as follows. other one is a real edge. For each edge e = (u, v) having a negative weight −w, we create a • Rigid: the skeleton is a triconnected graph other than the above new node d and replace the edge with (u, d) and (d, v), each of them types. having a positive weight w. The problem is equivalent to the weighted According to the above definition, each real edge corresponds to a Q- MAX-CUT problem, which is NP-hard in general. However, if G is node of SPQR-tree. In actual implementation, we follow the method a , then it is equivalent to the T-join problem of the in [14] that simply replaces the Q-node with a flag distinguishing a of G and hence is polynomial time solvable [16]. If G real edge from a virtual one. Fig. 4 shows a portion of an SPQR-tree is a bipartite graph, i.e. a graph without any odd cycles, then the corresponds to the biconnected graph in Fig. 3. Note that in the figure problem can be solved in linear time by a simple breadth-first search. virtual edges are drawn in dashed lines. For convenience, we call two Thus, the problem is also equivalent to finding a bipartite subgraph 0 components are adjacent if the corresponding tree nodes are adjacent. G = (V,E − Ec) where Ec is a set of edges to be deleted. In fact, Two adjacent components share a common separation pair {s, t}, and our method keeps track of Ec only in subdomain solving because their virtual edges e(s, t) have one-to-one correspondences in their usually the size of Ec is small. adjacency relationship. Note that SPQR-tree is an unrooted tree, and is uniquely defined for a given biconnected graph no matter what III.SPQR-TREE construction method is used. However, we may choose an arbitrary For the problem defined in the previous section, existing methods node as a root for our method. A virtual edge corresponding to its have large time complexity in order to achieve acceptable QoR parent tree node is also called a reference edge. Note that each virtual for practical layouts. To reduce the problem size, we propose a edge is shared by exactly one pair of adjacent skeletons, and each real divide-and-conquer method that divides the conflict graph into its edge appears in exactly one skeleton. In addition, the total number triconnected components using a data structure named SPQR-tree. of edges of skeletons is at most 3|E| − 6 [14]. Thus, the storage Recall that a graph G = (V,E) is a connected if every pair u, v ∈ requirement of the data structure including skeletons is O(|V |+|E|). V of vertices in G is connected by a path. A graph can be divided A remarkable result of SPQR-tree is that it can be constructed in into its connected components in linear time. Obviously, the color O(|V | + |E|) time [14]. In addition, the data structure enables us to assignment problem can be solved independently for each connected develop algorithms based on the tree traversals.

113 e f e f f b b d d e Algorithm 1 Calculate Cost Diff Recur d P b P P R R S Input: v: a non-root node of SPQR-tree R h R P c c E (v) s t a a c c Output: 00 : solution set with Color( ) = Color( ), g c c c a h E01(v): solution set with Color(s) 6= Color(t) S 1: for each child c of v do g 2: Calculate Cost Diff Recur(c) Fig. 4. An example of an SPQR-tree and its skeletons. Virtual edges are 3: end for indicated by dashed lines. 4: Let Gs = (Vs,Es) be the skeleton of v 5: if Gs is an S-type then 6: Let emin be the edge with minimum absolute weight 7: if Gs is odd then IVIDE AND ONQUER ETHOD IV. D - -C M 8: E00(v) ← {emin}; C00 ← |w(emin)| Our divide-and-conquer method consists of three basic steps: 9: E01(v) ← ∅; C01 ← 0 10: else A. Division step 11: E00(v) ← ∅; C00 ← 0 12: E (v) ← {e }; C ← |w(e )| First, we divide the conflict graph into its triconnected components 01 min 01 min 13: end if and keep track the relationship among the components using SPQR- 14: Cdiff ← C00 − C01 tree as described in the previous section. 15: else if Gs is a P-type then 0 16: Let E be the set of Es − {er} B. Bottom-up conquering 0 17: E00(v) ← {e : w(e) ≥ 0, e ∈ E } 0 In the second step, we solve the triconnected components in a 18: E01(v) ← {e : w(e) ≤ 0, e ∈ E } C ← P w(e) bottom-up manner according to the SPQR-tree. It can be done by 19: diff e∈E0 a recursive method presented in this section. Let a reference edge 20: else if Gs is an R-type then w(e ) ← −∞ {E (v),C } ← G of a non-root skeleton be er and the corresponding separation pair 21: r ; 00 00 Solve( s) be {s, t}. There are two potential solutions for the skeleton, namely 22: w(er) ← +∞; {E01(v),C01} ← Solve(Gs) s and t having same color (00), or s and t having opposite colors 23: Cdiff ← C00 − C01 24: end if (01). Denote the cost of the former as C00 and the cost of the later 25: Let p be the parent of v, ep be the corresponding virtual edge of er as C01. Inspired by the deferred merge embedding (DME) algorithm 26: w(ep) ← Cdiff in clock tree synthesis, we defer the decision of choosing which solution when bottoming up, and simply assign a weight C00 − C01 to the corresponding virtual edge of its parent skeleton. This weight to ensure that s and t are in the opposite colors produced by the represents the cost of choosing one solution than the other from solver. Note that the root component is only needed to be solved the parent point of view. The detail of the algorithm is given in once. The detail of the algorithm is given in Algorithm 2. Algorithm 1. In the following, we give the detail description of this algorithm C. Top-down merging for each type of skeletons. The solutions are kept track by recording In the top-down phase, we simply select and collect the solution the set of conflict edges that are needed to be removed to make the sets of the tree nodes based on the bipartite results of their parents. subgraph bipartite. It can easily be shown that the top-down phase can be performed in 1) S-type: The S-type skeleton is a cycle graph. Recall that a O(|V | + |E|) time. graph is a bipartite graph if it does not have any odd cycle. Thus, at most one edge is needed to be removed to make the subgraph V. EXPERIMENTAL RESULTS bipartite. We can easily determine if a cycle graph has an odd cycle To demonstrate the effectiveness of our method, we developed a by counting the number of its edges. Note that the edges with negative tool that performs the layout decomposition. We utilized the LEDA weight are doubly counted. Let emin be the edge with the minimum package [17] for the basic graph structure and algorithms. We exam- absolute weight. When considering the solution with {s, t} in same ined the methods presented on a Linux machine with Intel 3GHz 64- color, if the skeleton has an odd cycle, then emin is the candidate to bit CPU and 6GB RAM. All methods were implemented in C++. The be removed. Otherwise, no edge is needed to be removed. Similarly, compiler that we used was g++ 3.4.5. First, we verified our methods when considering the solution with {s, t} in opposite colors, if the with some small layouts of standard cells from Nangate 45nm open skeleton has an odd cycle, then no edge is needed to be removed. cell library, which can be downloaded from http://www.nangate.com. Otherwise, emin is the candidate to be removed. Fig. 1 indeed show the coloring results produced by our methods. 0 2) P-type: Let E be the set of edges of the P-type skeleton except Next we tested the methods with a real design named fft all1. the reference edge. When considering the solution with {s, t} in same The design uses 0.13 micron technology. The polysilicon layer of 0 color, we remove all the edges in E with positive weight in order to the design was extracted as an input. We set the minimum coloring resolve the conflicts. Similarly, when considering the solution with spacing to be 0.44µm. In this experiment, we tried to minimize the 0 {s, t} in opposite colors, we remove all the edges in E with negative number of conflicts and the number of stitches simultaneously. Edges weight. The difference of the costs is simply the sum of weights of with negative weights were supposed to be added for every pair 0 P all edges in E , i.e. e∈E0 w(e). of touching rectangles. However, our color assignment solver does 3) R-type: We rely on an existing color assignment solver to not support negative weights because Dijkstra’s algorithm has been calculate the costs for this type. When considering the solution with used for finding shortest paths. We actually did the trick presented {s, t} in same color, we assign −∞ to w(er) to ensure that s and t in Section II. An edge with negative weight −w is replaced with a are in same color produced by the solver. Similarly, When considering 1 the solution with {s, t} in opposite colors, we assign +∞ to w(er) The layout file can be obtained from the first author upon request.

114 Algorithm 2 Calculate Soln At Root VI.CONCLUSIONSAND FUTURE WORK Input: r: the root node of SPQR-tree In this paper, a graph division method has been presented that Output: E : the solution set at root level c divides a conflict graph into its triconnected components using SPQR- 1: Let G = (V ,E ) be the skeleton of r s s s tree. Experimental results show that on average 5X speedup without 2: if Gs is an S-type then any quality lost has been achieved. Thus, the proposed method is 3: Let emin be the edge with minimum absolute weight 4: if Gs is odd then very promising. Moreover, since each triconnected component can 5: Ev ← {emin} be solved more or less separately, this divide-and-conquer method 6: else has a great potential for parallelization and we expect that it can 7: Ev ← ∅ achieve even higher speedup when it runs in parallel. 8: end if 9: else if Gs is a P-type then REFERENCES 10: Let C be P w(e) e∈Es [1] C. Mack, “Seeing double,” Spectrum, IEEE, vol. 45, no. 11, pp. 46–51, 11: if C < 0 then 2008. 12: Ev ← {e : w(e) ≥ 0, e ∈ Es} [2] W. Arnold, “Challenges for lithography scaling to 32nm and below,” 13: else in VLSI Technology, Systems and Applications, 2007. VLSI-TSA 2007. 14: Ev ← {e : w(e) ≤ 0, e ∈ Es} International Symposium on, 2007, pp. 1–4. 15: end if [3] A. Widmann and K. Monahan, “Qualification of immersion double patterning,” in Semiconductor Manufacturing, 2007. ISSM 2007. Inter- 16: else if Gs is an R-type then national Symposium on, 2007, pp. 1–4. 17: E ← Solve(G ) v s [4] D. Z. Pan, S. Renwick, V. Singh, and J. Huckabay, “Nanolithography 18: end if and CAD challenges for 32nm/22nm and beyond,” in Computer-Aided 19: for each edge e in Ev do Design, 2008. ICCAD 2008. IEEE/ACM International Conference on, 20: if e is not a virtual edge then 2008, p. xii. 21: Ec ← Ec ∪ {e} [5] A. Kahng, C. Park, X. Xu, and H. Yao, “Layout decomposition for dou- 22: end if ble patterning lithography,” in Computer-Aided Design, 2008. ICCAD 23: end for 2008. IEEE/ACM International Conference on, 2008, pp. 465–472. 0 [6] C. Chiang, A. Kahng, S. Sinha, and X. Xu, “Fast and efficient phase 24: Let Gs = (Vs,Es − Ev) 25: Bipartite coloring G0 conflict detection and correction in standard-cell layouts,” in Computer- s Aided Design, 2005. ICCAD-2005. IEEE/ACM International Conference on, 2005, pp. 149–156. TABLE I [7] C. Chiang, A. B. Kahng, S. Sinha, X. Xu, and A. Z. Zelikovsky, “Fast EXPERIMENTALRESULTSOFTHERUNTIMEANDCOSTREDUCTION and efficient Bright-Field AAPSM conflict detection and correction,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Trans- CPU (sec.) Improv. (%) actions on, vol. 26, no. 1, pp. 115–126, 2007. #polygons #nodes/#edges w/ spqr w/o spqr time cost [8] C. Chiang, A. Kahng, S. Sinha, X. Xu, and A. Zelikovsky, “Bright-field 3631 31371/52060 13.29 38.25 65.3 4.58 AAPSM conflict detection and correction,” in Design, Automation and 9628 83733/138738 199.94 2706.12 92.6 2.19 Test in Europe, 2005. Proceedings, 2005, pp. 908–913 Vol. 2. 18360 159691/265370 400.43 4635.14 91.4 1.18 [9] K. Yuan, J.-S. Yang, and D. Pan, “Double patterning layout decom- 31261 284957/477273 1914.54 9964.18 80.7 1.61 position for simultaneous conflict and stitch minimization,” Computer- 49833 438868/738759 3397.26 15300.9 77.8 1.76 Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 75620 627423/1057794 3686.07 17643.9 79.1 2.50 vol. 29, no. 2, pp. 185 –196, feb. 2010. [10] J.-S. Yang, K. Lu, M. Cho, K. Yuan, and D. Pan, “A new graph-theoretic, multi-objective layout decomposition framework for double patterning dummy node plus two edges with positive weight w. The results are lithography,” in Design Automation Conference (ASP-DAC), 2010 15th Asia and South Pacific, jan. 2010, pp. 637 –644. given in Table I. We observe that up to 92.6% of the run time was [11] Y. Xu and C. Chu, “A matching based decomposer for double patterning reduced after applying our graph division method. On average about lithography,” in Proceedings of the 19th international symposium on 80% time reduction was achieved when the layouts were sufficiently Physical design, 2010, p. 121C126. large. Finally, we got a similar speedup for the Metal1 layer and for [12] J. E. Hopcroft and R. E. Tarjan, “Dividing a graph into triconnected considering the node splitting technique, although the run-time was components,” SIAM Journal on Computing, vol. 2, no. 3, pp. 135–158, 1973. much longer. Fig. 5 shows the coloring result of a portion of the [13] P. Bertolazzi, G. D. Battista, and W. Didimo, “Computing orthogonal layout in the Metal1 layer. drawings with the minimum number of bends,” IEEE Transactions on Computers, vol. 49, no. 8, pp. 826–840, 2000. [14] C. Gutwenger and P. Mutzel, “A linear time implementation of SPQR- trees,” Lecture notes in computer science, pp. 77–90, 2001. [15] K. D. Boese and A. B. Kahng, “Zero-skew clock routing trees with min- imum wirelength,” in ASIC Conference and Exhibit, 1992., Proceedings of Fifth Annual IEEE International, 1992, pp. 17–21. [16] F. Hadlock, “Finding a maximum cut of a planar graph in polynomial time,” SIAM Journal on Computing, vol. 4, no. 3, pp. 221–225, 1975. [17] K. Mehlhorn, S. Naher,¨ and C. Uhrig, The LEDA platform for combinatorial and geometric computing, 1997, pp. 7–16. [Online]. Available: http://dx.doi.org/10.1007/3-540-63165-8 161

Fig. 5. Coloring result for a portion of fft all in Metal1 layer

115